├── Latex_template.zip
├── ProcessedData
    ├── CHEMPROT
    │   ├── dev.tsv
    │   ├── test.tsv
    │   ├── test_normal_final.tsv
    │   ├── test_overlapping_final.tsv
    │   └── train.tsv
    └── DDIExtraction2013
    │   ├── dev.tsv
    │   ├── test.tsv
    │   ├── test_normal.tsv
    │   ├── test_overlapping.tsv
    │   └── train.tsv
├── README.md
├── Records
    └── record_76.56%
    │   ├── bert_config.json
    │   ├── eval_results.txt
    │   └── test_results.txt
├── Resources
    └── NCBI_BERT_pubmed_uncased_L-12_H-768_A-12
    │   ├── bert_config.json
    │   └── vocab.txt
├── SourceCode
    ├── BioRE.py
    ├── BioRE_BG.py
    ├── file_utils.py
    ├── modeling.py
    └── modeling_BG.py
└── model.jpg


/Latex_template.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CongSun-dlut/CPI_extraction/cda9f2b7f6fb757b999744cc9a95be205e2cc265/Latex_template.zip


--------------------------------------------------------------------------------
/ProcessedData/DDIExtraction2013/test_normal.tsv:
--------------------------------------------------------------------------------
  1 | index	sentence	label
  2 | DDI-DrugBank.d572.s1.p0	The incremental LDL-C reduction due to adding @DRUG$ to @DRUG$ may be reduced by this interaction.	DDI-effect
  3 | DDI-DrugBank.d572.s10.p0	Patients who take both @DRUG$ and @DRUG$ should be carefully monitored.	DDI-advise
  4 | DDI-DrugBank.d572.s11.p0	Carcinogenesis, Mutagenesis, Impairment of Fertility A 104-week dietary carcinogenicity study with @DRUG$ was conducted in rats at doses up to 1500 mg/kg/day (males) and 500 mg/kg/day (females) (~20 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$).	DDI-false
  5 | DDI-DrugBank.d572.s12.p0	A 104-week dietary carcinogenicity study with @DRUG$ was also conducted in mice at doses up to 500 mg/kg/day (>150 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$).	DDI-false
  6 | DDI-DrugBank.d572.s17.p0	In oral (gavage) fertility studies of @DRUG$ conducted in rats, there was no evidence of reproductive toxicity at doses up to 1000 mg/kg/day in male or female rats (~7 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$).	DDI-false
  7 | DDI-DrugBank.d572.s22.p0	In rabbits treated with @DRUG$, an increased incidence of extra thoracic ribs was observed at 1000 mg/kg/day (150 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$).	DDI-false
  8 | DDI-DrugBank.d572.s5.p0	Co-administration of @DRUG$ with @DRUG$ is not recommended until use in patients is studied.	DDI-advise
  9 | DDI-DrugBank.d573.s1.p0	If at all possible @DRUG$ should be discontinued well before @DRUG$ is begun.	DDI-advise
 10 | DDI-DrugBank.d575.s2.p0	Therefore, caution should be used in administering @DRUG$ (@DRUG$ injection) to patients receiving these other agents.	DDI-false
 11 | DDI-DrugBank.d575.s3.p0	There was no change in the plasma kinetics of @DRUG$ when coadministered with @DRUG$.	DDI-false
 12 | DDI-DrugBank.d575.s5.p0	An increase in intracellular levels of @DRUG$ was observed in vitro in the presence of @DRUG$.	DDI-mechanism
 13 | DDI-DrugBank.d576.s4.p0	@DRUG$ given concomitantly with @DRUG$ may predispose to systemic bleeding.	DDI-effect
 14 | DDI-DrugBank.d576.s5.p0	@DRUG$ may enhance the hypoglycemic effect of oral @DRUG$ of the sulfonylurea class.	DDI-effect
 15 | DDI-DrugBank.d577.s0.p0	In elderly patients concurrently receiving certain @DRUG$, primarily @DRUG$, an increased incidence of thrombopenia with purpura has been reported.	DDI-false
 16 | DDI-DrugBank.d577.s3.p0	@DRUG$ may inhibit the hepatic metabolism of @DRUG$.	DDI-mechanism
 17 | DDI-DrugBank.d577.s4.p0	At a 1.6-g dose, @DRUG$ produced a slight but significant increase in the half-life of @DRUG$ but did not produce a corresponding decrease in the metabolic clearance rate.	DDI-mechanism
 18 | DDI-DrugBank.d578.s1.p0	Mutual inhibition of metabolism occurs with concurrent use of @DRUG$ and @DRUG$;	DDI-mechanism
 19 | DDI-DrugBank.d578.s10.p0	The effect of @DRUG$ on oral @DRUG$ is variable.	DDI-false
 20 | DDI-DrugBank.d578.s11.p0	There are reports of enhanced as well as diminished effects of @DRUG$ when given concurrently with @DRUG$.	DDI-effect
 21 | DDI-DrugBank.d578.s3.p0	convulsions have been reported with concurrent use of @DRUG$ and @DRUG$.	DDI-effect
 22 | DDI-DrugBank.d578.s7.p0	@DRUG$ may increase the clearance of chronic high dose @DRUG$.	DDI-mechanism
 23 | DDI-DrugBank.d578.s9.p0	@DRUG$ should be used cautiously in conjunction with @DRUG$ in patients suffering from hypoprothrombinemia.	DDI-advise
 24 | DDI-DrugBank.d580.s1.p0	However, @DRUG$, a potent inhibitor of cytochrome P450 3A4, may increase plasma levels of @DRUG$ during concomitant dosing.	DDI-mechanism
 25 | DDI-DrugBank.d581.s2.p0	patients receiving @DRUG$ and @DRUG$ should have more frequent monitoring of neutrophil counts.	DDI-advise
 26 | DDI-DrugBank.d584.s5.p0	A 5.5-fold decrease in the mean @DRUG$ dose from 6.13 mg/day to 1.13 mg/day (approximately 80-85% reduction of @DRUG$ dose), was necessary to maintain a target INR of 1.5.	DDI-false
 27 | DDI-DrugBank.d584.s7.p0	Furthermore, in patients receiving both drugs, careful monitoring of the INR or PT, and adjustment of the @DRUG$ dosage if indicated are recommended when the @DRUG$ dose is changed or discontinued.	DDI-advise
 28 | DDI-DrugBank.d585.s1.p0	@DRUG$ antagonize the effects of @DRUG$.	DDI-effect
 29 | DDI-DrugBank.d585.s2.p0	@DRUG$ in the presence of increased intraocular pressure may be hazardous when taken concurrently with agents such as @DRUG$.	DDI-effect
 30 | DDI-DrugBank.d585.s3.p0	@DRUG$ may affect gastrointestinal absorption of various drugs, such as slowly dissolving dosage forms of @DRUG$;	DDI-mechanism
 31 | DDI-DrugBank.d585.s5.p0	@DRUG$ may antagonize the effects of drugs that alter gastrointestinal motility, such as @DRUG$.	DDI-effect
 32 | DDI-DrugBank.d585.s6.p0	Because @DRUG$ may interfere with the absorption of @DRUG$, simultaneous use of these drugs should be avoided.	DDI-mechanism
 33 | DDI-DrugBank.d588.s12.p0	@DRUG$: Efficacy may be decreased due to urinary alkalizing effect of @DRUG$.	DDI-effect
 34 | DDI-DrugBank.d588.s6.p0	@DRUG$ or @DRUG$: May increase the risk of hypokalemia and increase salt and water retention.	DDI-false
 35 | DDI-DrugBank.d588.s7.p0	@DRUG$: Serum @DRUG$ levels may increase.	DDI-false
 36 | DDI-DrugBank.d590.s1.p0	Coadministration with @DRUG$ results in a slight decrease in the rate of @DRUG$ absorption that is clinically unimportant.	DDI-false
 37 | DDI-DrugBank.d590.s4.p0	For information on drug interactions associated with @DRUG$, refer to the @DRUG$     package insert.	DDI-false
 38 | DDI-DrugBank.d591.s2.p0	In vivo, the plasma clearance of @DRUG$ was reduced by 70% during coadministration of @DRUG$ (25 mg bid for 2 days), a selective and potent CYP1A2 inhibitor.	DDI-mechanism
 39 | DDI-DrugBank.d591.s5.p0	Possible interactions with drugs known to be metabolized by CYP1A2 via competitive inhibition such as @DRUG$ and @DRUG$ may also occur.	DDI-false
 40 | DDI-DrugBank.d595.s2.p0	The addition of @DRUG$ to parenteral solutions containing @DRUG$ should be avoided, except where compatibility has been previously established.	DDI-false
 41 | DDI-DrugBank.d596.s4.p0	The elevation of prothrombin time induced by @DRUG$ is not affected by administration of @DRUG$.	DDI-false
 42 | DDI-DrugBank.d596.s5.p0	Effect of Other Drugs on the Metabolism of @DRUG$: Drugs that induce or inhibit CYP450 metabolism are not expected to alter the metabolism of @DRUG$.	DDI-false
 43 | DDI-DrugBank.d597.s1.p0	This interaction should be given consideration in patients taking @DRUG$ concomitantly with @DRUG$.	DDI-advise
 44 | DDI-DrugBank.d597.s13.p0	In vitro testing found no protein binding drug interaction between @DRUG$ and @DRUG$.	DDI-false
 45 | DDI-DrugBank.d597.s16.p0	Studies with @DRUG$ agents and @DRUG$ have not demonstrated a reduction in natriuretic effect.	DDI-false
 46 | DDI-DrugBank.d597.s17.p0	@DRUG$: single and multiple dose pharmacodynamics and pharmacokinetics are not affected by multiple doses of @DRUG$.	DDI-false
 47 | DDI-DrugBank.d597.s18.p0	Nevertheless, during concomitant therapy with @DRUG$ and @DRUG$, patients should be observed closely for signs of declining renal function, as well as to assure diuretic efficacy.	DDI-advise
 48 | DDI-DrugBank.d597.s22.p0	Patients on @DRUG$ treatment should be closely monitored when @DRUG$ is introduced or withdrawn.	DDI-advise
 49 | DDI-DrugBank.d597.s24.p0	@DRUG$ did not have a significant effect on the pharmacokinetics of single doses of @DRUG$.	DDI-false
 50 | DDI-DrugBank.d597.s25.p0	In vitro, @DRUG$ did not displace @DRUG$ from its human serum binding sites.	DDI-false
 51 | DDI-DrugBank.d597.s6.p0	@DRUG$ is not a substitute for @DRUG$ for cardiovascular prophylaxis.	DDI-false
 52 | DDI-DrugBank.d598.s0.p0	@DRUG$ may inhibit the hepatic metabolism of @DRUG$.	DDI-mechanism
 53 | DDI-DrugBank.d599.s4.p0	If @DRUG$ replacement is required, the @DRUG$ dose should be carefully adjusted.	DDI-false
 54 | DDI-DrugBank.d599.s6.p0	Limited published data indicate that @DRUG$ treatment increases cytochrome P450 (CP450) mediated @DRUG$ clearance in man.	DDI-mechanism
 55 | DDI-DrugBank.d602.s0.p0	Caution is recommended when administering @DRUG$ with compounds that are metabolized/eliminated predominantly by the UGT1A1 pathway (e.g. @DRUG$).	DDI-advise
 56 | DDI-DrugBank.d602.s1.p0	Concomitant treatment with @DRUG$ resulted in a 21% increase in the AUC of @DRUG$.	DDI-mechanism
 57 | DDI-DrugBank.d602.s2.p0	Caution is recommended when administering @DRUG$ with @DRUG$.	DDI-advise
 58 | DDI-DrugBank.d603.s0.p0	When administered concomitantly with @DRUG$   , @DRUG$ may enhance or precipitate bradycardia, A.V.	DDI-effect
 59 | DDI-DrugBank.d603.s3.p0	@DRUG$    has been used in patients concomitantly treated with salt-retaining steroid therapy (i.e., @DRUG$), with or without salt supplementation.	DDI-false
 60 | DDI-DrugBank.d604.s5.p0	@DRUG$ with @DRUG$ may reduce the effectiveness of the contraception and up to one month after discontinuation of these concomitant therapies.	DDI-effect
 61 | DDI-DrugBank.d608.s1.p0	However, increased prothrombin time and bleeding have been reported in patients on concomitant @DRUG$ and @DRUG$ therapy.	DDI-effect
 62 | DDI-DrugBank.d608.s2.p0	Therefore, caution should be exercised when administering @DRUG$ to patients on @DRUG$.	DDI-advise
 63 | DDI-DrugBank.d608.s4.p0	Caution should be used if @DRUG$ is administered concomitantly with @DRUG$.	DDI-advise
 64 | DDI-DrugBank.d610.s2.p0	@DRUG$ has no effect on the pharmacokinetic properties of @DRUG$.	DDI-false
 65 | DDI-DrugBank.d613.s0.p0	A study published in 2002 found that @DRUG$ causes a statistically significant increase in plasma clearance of @DRUG$.	DDI-mechanism
 66 | DDI-DrugBank.d616.s1.p0	Caution should be exercised when administering @DRUG$ therapy in combination with other @DRUG$.	DDI-advise
 67 | DDI-DrugBank.d619.s0.p0	The occurrence of stupor, muscular rigidity, severe agitation, and elevated temperature has been reported in some patients receiving the combination of @DRUG$ and @DRUG$.	DDI-effect
 68 | DDI-DrugBank.d619.s2.p0	This is typical of the interaction of @DRUG$ and @DRUG$.	DDI-int
 69 | DDI-DrugBank.d622.s1.p0	This is especially true if the total dose of @DRUG$ has been large and the administration of @DRUG$ has been delayed.	DDI-false
 70 | DDI-DrugBank.d625.s1.p0	Impaired renal function has been described in bone marrow transplant patients who were conditioned with high-dose intravenous @DRUG$ and who subsequently received @DRUG$ to prevent graft-versus-host disease	DDI-effect
 71 | DDI-DrugBank.d627.s0.p0	@DRUG$ should be administered with caution to patients taking @DRUG$ because of the possibility of conduction disturbances.	DDI-advise
 72 | DDI-DrugBank.d627.s3.p0	These effects should be considered when anticholinergic properties may be contributing to the therapeutic effect of concomitant medication (e.g., @DRUG$, inhaled @DRUG$).	DDI-false
 73 | DDI-DrugBank.d631.s2.p0	@DRUG$ (@DRUG$) should be administered until normal hematopoiesis is restored.	DDI-false
 74 | DDI-DrugBank.d631.s3.p0	Mild hepatotoxicity has been reported in some patients when @DRUG$ and @DRUG$ were administered concomitantly.	DDI-effect
 75 | DDI-DrugBank.d632.s3.p0	When the rapid onset of a concomitant orally administered agent is a critical determinant of effectiveness (such as @DRUG$), the agent should be administered at least 1 hour prior to or 2 hours after @DRUG$ injection.	DDI-advise
 76 | DDI-DrugBank.d632.s5.p0	No formal interaction studies have been performed to assess the effect of @DRUG$ on the kinetics of oral @DRUG$.	DDI-false
 77 | DDI-DrugBank.d632.s7.p0	Thus, @DRUG$ and @DRUG$ should not be mixed and must be administered separately.	DDI-false
 78 | DDI-DrugBank.d633.s0.p0	Since @DRUG$ is a substrate for the metabolic pathways involving CYP2D6 and CYP1A2 enzymes, inhibition or induction of either of these enzymes would be expected to alter @DRUG$ plasma concentrations.	DDI-false
 79 | DDI-DrugBank.d633.s1.p0	In a formal, single-dose interaction study (n = 6 males) the clearance of @DRUG$ was decreased by 38% following the coadministration of @DRUG$, an inhibitor of CYP1A2.	DDI-mechanism
 80 | DDI-DrugBank.d633.s11.p0	In a formal study, @DRUG$ were shown not to affect @DRUG$    plasma concentrations.	DDI-false
 81 | DDI-DrugBank.d633.s2.p0	In another formal study (n = 8 extensive and n = 7 poor metabolizers of CYP2D6), coadministration of @DRUG$ did not alter the kinetics of @DRUG$ in the poor CYP2D6 metabolizer group.	DDI-false
 82 | DDI-DrugBank.d633.s20.p0	Additionally, in one controlled study in five normal subjects and seven patients, the clearance of @DRUG$ was decreased 50% following the administration of @DRUG$   .	DDI-mechanism
 83 | DDI-DrugBank.d633.s4.p0	In this crossover steady state study, the pharmacokinetics of @DRUG$ were unaffected in either phenotype by the coadministration of @DRUG$.	DDI-false
 84 | DDI-DrugBank.d633.s6.p0	When concomitant administration of either of these two drugs with @DRUG$ is initiated, the dose of @DRUG$ should be slowly titrated to desired effect.	DDI-false
 85 | DDI-DrugBank.d634.s3.p0	@DRUG$: Serum digitalis levels may be increased when hyperthyroid patients on a stable @DRUG$ regimen become euthyroid;	DDI-false
 86 | DDI-DrugBank.d635.s0.p0	Other @DRUG$ should not be used concomitantly with @DRUG$ because they may have additive effects.	DDI-advise
 87 | DDI-DrugBank.d636.s0.p0	Some @DRUG$ may interact with @DRUG$.	DDI-int
 88 | DDI-DrugBank.d637.s1.p0	Use of @DRUG$ in conjunction with oral @DRUG$ may increase the risk of respiratory depression, hypotension and profound sedation or coma.	DDI-effect
 89 | DDI-DrugBank.d638.s0.p0	@DRUG$ may decrease the hypotensive effect of @DRUG$.	DDI-effect
 90 | DDI-DrugBank.d640.s0.p0	@DRUG$: Excessive reductions in blood pressure may occur in patients on diuretic therapy when @DRUG$ are started.	DDI-false
 91 | DDI-DrugBank.d640.s8.p0	If a @DRUG$ is also used, the risk of @DRUG$ toxicity may be increased.	DDI-effect
 92 | DDI-DrugBank.d641.s2.p0	Concomitant administration of substances that are also tubularly secreted (e.g., @DRUG$) could potentially result in delayed clearance of @DRUG$.	DDI-mechanism
 93 | DDI-DrugBank.d641.s4.p0	Patients with mild to moderate renal insufficiency should avoid taking @DRUG$ with short elimination half-lives for a period of 2 days before, the day of, and 2 days following administration of @DRUG$.	DDI-advise
 94 | DDI-DrugBank.d642.s0.p0	Prothrombin time or other suitable anticoagulation test should be monitored if @DRUG$ is administered with @DRUG$.	DDI-advise
 95 | DDI-DrugBank.d645.s4.p0	@DRUG$ decreases the clearance of @DRUG$ by 19%.	DDI-mechanism
 96 | DDI-DrugBank.d645.s5.p0	@DRUG$ increases the clearance of @DRUG$ by 15%.	DDI-mechanism
 97 | DDI-DrugBank.d645.s8.p0	@DRUG$ clearance is unaffected by @DRUG$.	DDI-false
 98 | DDI-DrugBank.d647.s1.p0	@DRUG$ has been reported to prolong the elimination half-life of @DRUG$ and may lead to severe bone marrow suppression;	DDI-mechanism
 99 | DDI-DrugBank.d647.s2.p0	a reduction of the @DRUG$ dosage should be considered in patients receiving @DRUG$ concurrently.	DDI-advise
100 | DDI-DrugBank.d648.s3.p0	If @DRUG$ is given concurrently with @DRUG$, prothrombin time should be monitored.	DDI-advise
101 | DDI-DrugBank.d650.s2.p0	Because @DRUG$ is approximately 90% bound to plasma proteins, caution should be exercised if @DRUG$ is coadministered with other drugs known to affect protein binding.	DDI-false
102 | DDI-DrugBank.d651.s0.p0	Drug Interactions: Women on oral @DRUG$ have shown a significant increase in plasma @DRUG$ levels.	DDI-mechanism
103 | DDI-DrugBank.d652.s2.p0	The finding that @DRUG$ releases catecholamines in patients with essential hypertension suggests that it should be used cautiously, if at all, in patients receiving @DRUG$.	DDI-advise
104 | DDI-DrugBank.d652.s6.p0	Because the action of @DRUG$ will influence the delivery of food to the intestines and thus the rate of absorption, @DRUG$ dosage or timing of dosage may require adjustment.	DDI-advise
105 | DDI-DrugBank.d653.s4.p0	@DRUG$ may increase the clearance of chronic high dose @DRUG$.	DDI-mechanism
106 | DDI-DrugBank.d653.s6.p0	@DRUG$ should be used cautiously in conjunction with @DRUG$ in patients suffering from hypoprothrombinemia.	DDI-advise
107 | DDI-DrugBank.d653.s7.p0	The effect of @DRUG$ on oral @DRUG$ is variable.	DDI-effect
108 | DDI-DrugBank.d653.s8.p0	There are reports of enhanced as well as diminished effects of @DRUG$ when given concurrently with @DRUG$.	DDI-effect
109 | DDI-DrugBank.d654.s2.p0	Based on in vitro P450 inhibition studies and the low systemic exposure observed following topical application of @DRUG$, @DRUG$ is unlikely to affect the metabolism of other P450 substrates.	DDI-false
110 | DDI-DrugBank.d656.s3.p0	It is recommended that plasma @DRUG$ levels be monitored when initiating, adjusting and discontinuing @DRUG$.	DDI-advise
111 | DDI-DrugBank.d658.s0.p0	When administered concurrently, @DRUG$ may increase the effects of oral @DRUG$;	DDI-effect
112 | DDI-DrugBank.d661.s2.p0	The median duration of action of @DRUG$    0.6 mg/kg administered after a 1 mg/kg dose of @DRUG$ when T 1 returned to 75% of control was 36 minutes (range 14-57, n=12) vs.	DDI-false
113 | DDI-DrugBank.d661.s4.p0	There are no controlled studies documenting the use of @DRUG$    before or after other @DRUG$.	DDI-false
114 | DDI-DrugBank.d665.s7.p0	Displacement of @DRUG$ from plasma protein binding sites will elevate the level of free @DRUG$ in the serum.	DDI-false
115 | DDI-DrugBank.d666.s0.p0	Catecholamine-depleting drugs (e.g., @DRUG$) may have an additive effect when given with @DRUG$.	DDI-effect
116 | DDI-DrugBank.d666.s3.p0	@DRUG$ has been shown to increase serum @DRUG$ levels when both drugs are co-administered.	DDI-mechanism
117 | DDI-DrugBank.d668.s1.p0	Furthermore, @DRUG$, @DRUG$, St.	DDI-false
118 | DDI-DrugBank.d669.s0.p0	Concomitant administration of @DRUG$ and @DRUG$ has been associated with erythema and histamine-like flushing and anaphylactoid reactions.	DDI-effect
119 | DDI-DrugBank.d670.s0.p0	Potential for @DRUG$    to Affect Other Drugs @DRUG$ is not expected to cause clinically important pharmacokinetic interactions with drugs that are metabolized by cytochrome P450 isozymes.	DDI-false
120 | DDI-DrugBank.d670.s9.p0	Potential for Other Drugs to Affect @DRUG$    @DRUG$ is not a substrate of CYP1A2, CYP2A6, CYP2C9, and CYP2C19, so that an interaction with inhibitors or inducers of these isozymes is unlikely.	DDI-false
121 | DDI-DrugBank.d671.s0.p0	The interaction of @DRUG$ (@DRUG$) with other drugs has not been studied in humans.	DDI-false
122 | DDI-DrugBank.d671.s2.p0	Decreased seizure threshold has been reported in patients receiving @DRUG$ concomitantly with @DRUG$.	DDI-effect
123 | DDI-DrugBank.d676.s4.p0	No cross-resistance with other @DRUG$, radiotherapy or @DRUG$ has been demonstrated.	DDI-false
124 | DDI-DrugBank.d677.s1.p0	as well as @DRUG$ and @DRUG$.	DDI-false
125 | DDI-DrugBank.d678.s1.p0	The metabolism of @DRUG$ is accelerated by @DRUG$;	DDI-mechanism
126 | DDI-DrugBank.d680.s3.p0	Therefore, when coadministered with @DRUG$, adjustment of the dosage of @DRUG$ or of such drugs may not be necessary.	DDI-false
127 | DDI-DrugBank.d680.s7.p0	Patients treated with @DRUG$ and @DRUG$ concomitantly should be monitored for increases in INR and prothrombin time.	DDI-advise
128 | DDI-DrugBank.d686.s0.p0	@DRUG$ may compete with other drugs, such as @DRUG$, for sites of metabolism in the liver, thus elevating the serum levels of such compounds to potentially toxic levels.	DDI-mechanism
129 | DDI-DrugBank.d686.s1.p0	Therefore, when concomitant use of @DRUG$ and @DRUG$ is anticipated, it may be necessary to monitor blood levels and/or reduce the dosage of such compounds.	DDI-advise
130 | DDI-DrugBank.d687.s0.p0	Administration of @DRUG$ decreases oral clearance of @DRUG$ by about 5%.	DDI-mechanism
131 | DDI-DrugBank.d688.s2.p0	Administration of a higher dose of @DRUG$ should be considered when coadministering with @DRUG$.	DDI-advise
132 | DDI-DrugBank.d689.s3.p0	The pressor response of @DRUG$ may also be potentiated by @DRUG$.	DDI-effect
133 | DDI-DrugBank.d693.s1.p0	When daily doses of @DRUG$ 2 g and weekly doses of @DRUG$ 7.5 mg were coadministered to 15 rheumatoid arthritis patients in a drug-drug interaction study, the pharmacokinetic disposition of the drugs was not altered.	DDI-false
134 | DDI-DrugBank.d693.s2.p0	Daily doses of @DRUG$ 2 g (maximum 3 g) and weekly doses of @DRUG$ 7.5 mg (maximum 15 mg) were administered alone or in combination to 310 rheumatoid arthritis patients in two controlled 52-week clinical studies.	DDI-false
135 | DDI-DrugBank.d694.s2.p0	No dosage adjustment of @DRUG$ or @DRUG$ is recommended.	DDI-false
136 | DDI-DrugBank.d696.s0.p0	@DRUG$ may reverse the analgesic activity of @DRUG$.	DDI-effect
137 | DDI-DrugBank.d698.s1.p0	@DRUG$ - concurrent use decreases gastrointestinal absorption of @DRUG$;	DDI-mechanism
138 | DDI-DrugBank.d698.s2.p0	requirements for @DRUG$ may be increased in patients receiving @DRUG$.	DDI-effect
139 | DDI-DrugBank.d700.s1.p0	It is recommended that @DRUG$ and @DRUG$ not be administered simultaneously.	DDI-advise
140 | DDI-DrugBank.d700.s10.p0	However, in a drug-drug interaction study, mean @DRUG$ AUC was decreased by 15% when coadministered with @DRUG$.	DDI-mechanism
141 | DDI-DrugBank.d700.s11.p0	Therefore, it is recommended that oral @DRUG$ are co- administered with @DRUG$ with caution and additional birth control methods be considered.	DDI-advise
142 | DDI-DrugBank.d700.s16.p0	Interference of @DRUG$ hydrolysis may lead to less @DRUG$ available for absorption.	DDI-false
143 | DDI-DrugBank.d704.s0.p0	Drug-Drug Interactions: The pharmacokinetic and pharmacodynamic interactions between @DRUG$ capsules and other @DRUG$ have not been determined.	DDI-false
144 | DDI-DrugBank.d704.s1.p0	However, interactions may be expected and @DRUG$ capsules should NOT be used in combination with other @DRUG$.	DDI-advise
145 | DDI-DrugBank.d704.s2.p0	The pharmacokinetic interaction between @DRUG$ and @DRUG$ capsules was investigated.	DDI-false
146 | DDI-DrugBank.d704.s4.p0	Therefore, @DRUG$ capsules should be used with caution in combination with @DRUG$, particularly at doses higher than 0.4 mg.	DDI-advise
147 | DDI-DrugBank.d704.s5.p0	Results from limited in vitro and in vivo drug-drug interaction studies between @DRUG$ and @DRUG$ are inconclusive.	DDI-false
148 | DDI-DrugBank.d704.s6.p0	Therefore, caution should be exercised with concomitant administration of @DRUG$ and @DRUG$ capsules.	DDI-advise
149 | DDI-DrugBank.d706.s0.p0	Catecholamine-depleting drugs (e.g., @DRUG$) may have an additive effect when given with @DRUG$.	DDI-effect
150 | DDI-DrugBank.d709.s5.p0	Although such a risk is not verified for @DRUG$, combination of @DRUG$ with such drugs is not recommended.	DDI-false
151 | DDI-DrugBank.d710.s2.p0	@DRUG$: @DRUG$, 100 mg, was given twice daily for 13 days to 12 healthy volunteers.	DDI-false
152 | DDI-DrugBank.d710.s5.p0	Coadministration with @DRUG$ reduced the peak concentration and AUC values for @DRUG$ by approximately 11%.	DDI-mechanism
153 | DDI-DrugBank.d710.s6.p0	@DRUG$: @DRUG$, 100 mg, was given twice daily fro 13 days to 12 healthy volunteers.	DDI-false
154 | DDI-DrugBank.d710.s9.p0	Peak plasma concentrations and AUC of @DRUG$ were reduced approximately 10% in the presence of @DRUG$.	DDI-mechanism
155 | DDI-DrugBank.d711.s3.p0	The concurrent use of @DRUG$ and @DRUG$ has been reported to result in fatal renal toxicity.	DDI-effect
156 | DDI-DrugBank.d713.s0.p0	The concomitant use of @DRUG$ with other @DRUG$ that produce dry mouth, constipation, and other anticholinergic pharmacological effects may increase the frequency and/or severity of such effects.	DDI-effect
157 | DDI-DrugBank.d713.s3.p0	Coadministration of @DRUG$ with drugs that are eliminated by active renal tubular secretion may increase the serum concentration of @DRUG$ and/or the coadministered drug due to competition for this elimination pathway.	DDI-false
158 | DDI-DrugBank.d716.s1.p0	Changes in @DRUG$ and other diabetes drug therapies may be necessary during treatment with @DRUG$.	DDI-advise
159 | DDI-DrugBank.d720.s2.p0	@DRUG$ reduce @DRUG$ clearance, significantly increasing systemic exposure.	DDI-mechanism
160 | DDI-DrugBank.d720.s4.p0	If concomitant treatment with @DRUG$ and an @DRUG$ is clinically warranted, appropriate observation of the patient is advised.	DDI-advise
161 | DDI-DrugBank.d721.s3.p0	The pharmacokinetic parameters of @DRUG$ are not significantly modified by @DRUG$ coadministration.	DDI-false
162 | DDI-DrugBank.d721.s4.p0	In vitro studies show that @DRUG$ does not displace @DRUG$ from its binding site on protein.	DDI-false
163 | DDI-DrugBank.d725.s3.p0	The pressor response of @DRUG$ may also be potentiated by @DRUG$.	DDI-effect
164 | DDI-DrugBank.d726.s2.p0	Dose adjustment of @DRUG$ Capsules may be required, and iPTH and serum calcium concentrations should be closely monitored if a patient initiates or discontinues therapy with a strong CYP3A4 inhibitor such as @DRUG$.	DDI-advise
165 | DDI-DrugBank.d727.s0.p0	@DRUG$ competitively inhibits the intracellular phosphorylation of @DRUG$.	DDI-effect
166 | DDI-DrugBank.d727.s1.p0	Therefore, use of @DRUG$ in combination with @DRUG$ should be avoided.	DDI-advise
167 | DDI-DrugBank.d728.s1.p0	No pharmacokinetic interaction between 85 mg/m2 @DRUG$ and infusional @DRUG$ has been observed in patients treated every 2 weeks.	DDI-false
168 | DDI-DrugBank.d728.s2.p0	Increases of @DRUG$ plasma concentrations by approximately 20% have been observed with doses of 130 mg/m2 @DRUG$ dosed every 3 weeks.	DDI-mechanism
169 | DDI-DrugBank.d732.s0.p0	@DRUG$ is not known to interact with other drugs including @DRUG$ supplements;	DDI-false
170 | DDI-DrugBank.d732.s2.p0	Concomitant administration of @DRUG$ with other chelation therapy, such as @DRUG$ is not recommended.	DDI-advise
171 | DDI-DrugBank.d734.s0.p0	@DRUG$ may accentuate the orthostatic hypotension that may occur with @DRUG$.	DDI-effect
172 | DDI-DrugBank.d734.s1.p0	Antihypertensive effects of @DRUG$ and related compounds may be counteracted when @DRUG$ are used concomitantly.	DDI-effect
173 | DDI-DrugBank.d734.s2.p0	Concomitant administration of @DRUG$ with @DRUG$ results in increased plasma levels of both drugs.	DDI-mechanism
174 | DDI-DrugBank.d736.s1.p0	Hypokalemia may develop during concomitant use of @DRUG$ or @DRUG$.	DDI-false
175 | DDI-DrugBank.d736.s3.p0	@DRUG$ may decrease arterial responsiveness to @DRUG$.	DDI-effect
176 | DDI-DrugBank.d736.s5.p0	@DRUG$ may increase the responsiveness of @DRUG$.	DDI-effect
177 | DDI-DrugBank.d736.s7.p0	@DRUG$ may add to or potentiate the action of other @DRUG$.	DDI-effect
178 | DDI-DrugBank.d736.s8.p0	Potentiation occurs with @DRUG-DRUG$.	DDI-false
179 | DDI-DrugBank.d737.s7.p0	CYP interactions: Inhibitors of cytochrome P450 enzymes would not be expected to affect @DRUG$ elimination because @DRUG$ is not appreciably metabolized by these enzymes in vivo or in vitro.	DDI-false
180 | DDI-DrugBank.d738.s0.p0	Do not exceed a 5 mg daily dose of @DRUG$ when administered with therapeutic doses of @DRUG$ or other potent CYP3A4 inhibitors.	DDI-advise
181 | DDI-DrugBank.d739.s0.p0	@DRUG$ at a Dose of 10 mg Once Daily Dosed to Pharmacokinetic Steady State  - did not cause clinically significant changes in the kinetics of a single intravenous dose of @DRUG$ (predominantly a cytochrome P450 1A2 substrate).	DDI-false
182 | DDI-DrugBank.d739.s1.p0	- did not change the pharmacokinetic profile of @DRUG$ (a substrate of cytochromes P450 2A6 and 2C9) or influence the effect of a single 30-mg oral dose of @DRUG$ on prothrombin time or the INR (International Normalized Ratio).	DDI-false
183 | DDI-DrugBank.d742.s10.p0	However, the impairment of motor skills produced by @DRUG$ has been shown to be additive with those caused by @DRUG$.	DDI-effect
184 | DDI-DrugBank.d742.s11.p0	Accordingly, patients should be advised to avoid @DRUG$ and other similar drugs while taking @DRUG$.	DDI-advise
185 | DDI-DrugBank.d742.s2.p0	Drugs Affecting Hepatic Metabolism The metabolism and pharmacokinetics of @DRUG$    (@DRUG$) Orally Disintegrating Tablets may be affected by the induction or inhibition of drug-metab-olizing enzymes.	DDI-false
186 | DDI-DrugBank.d742.s8.p0	However, the impairment of cognitive and motor skills produced by @DRUG$    were shown to be additive with those produced by @DRUG$.	DDI-effect
187 | DDI-DrugBank.d743.s11.p0	A few spontaneous accounts of QT interval prolongation with ventricular arrhythmia including torsades de pointes, have been reported in patients receiving @DRUG$ or @DRUG$.	DDI-false
188 | DDI-DrugBank.d743.s13.p0	Studies to evaluate potential interactions of @DRUG$ with @DRUG$ are in progress.	DDI-false
189 | DDI-DrugBank.d743.s4.p0	Concomitant administration of @DRUG$ and @DRUG$ is contraindicated.	DDI-advise
190 | DDI-DrugBank.d743.s7.p0	Concomitant administration of @DRUG$ and @DRUG$ is contraindicated.	DDI-advise
191 | DDI-DrugBank.d750.s0.p0	Concomitant use of @DRUG$ with @DRUG$ may result in an adverse drug interaction.	DDI-int
192 | DDI-DrugBank.d752.s9.p0	The difficulty in achieving adequate sedation may have been the result of decreased absorption of the @DRUG$ due to both the gastrointestinal effects and stimulant effects of @DRUG$.	DDI-false
193 | DDI-DrugBank.d753.s2.p0	@DRUG$ or other medications with anticholinergic activity when used concurrently with @DRUG$ may result in increased risk of urinary retention and/or severe constipation, which may lead to paralytic ileus.	DDI-effect
194 | DDI-DrugBank.d753.s3.p0	It has been reported that the incidence of bradycardia was increased when @DRUG$ was combined with @DRUG$ for induction of anesthesia.	DDI-effect
195 | DDI-DrugBank.d753.s4.p0	In addition, CNS toxicity has been reported (confusion, disorientation, respiratory depression, apnea, seizures) following coadministration of @DRUG$ with @DRUG$;	DDI-effect
196 | DDI-DrugBank.d756.s1.p0	Combination with @DRUG$ (@DRUG$) is associated with increased risk of seizures.	DDI-false
197 | DDI-DrugBank.d758.s0.p0	The CNS depressant effects of @DRUG$ may be additive with that of other @DRUG$..	DDI-effect
198 | DDI-DrugBank.d759.s0.p0	Limited evidence suggests that @DRUG$ may influence the intensity and duration of action of @DRUG$.	DDI-mechanism
199 | DDI-DrugBank.d761.s12.p0	Although undercarboxylated osteocalcin, a marker of vitamin K nutritional status, was unaltered with @DRUG$ administration, vitamin K levels tended to decline in subjects taking @DRUG$.	DDI-false
200 | DDI-DrugBank.d761.s4.p0	@DRUG$ inhibited absorption of a @DRUG$ supplement by approximately 60%.	DDI-mechanism
201 | DDI-DrugBank.d762.s2.p0	Therefore, the coadministration of @DRUG$ with @DRUG$ is not recommended.	DDI-advise
202 | DDI-DrugBank.d763.s1.p0	Because @DRUG$ is metabolized by hepatic cytochrome P-450 drug-metabolizing enzymes (CYP3A4, CYP2D6, CYP1A2), inducers or inhibitors of these enzymes may change the clearance and, hence, the half-life of @DRUG$.	DDI-false
203 | DDI-DrugBank.d763.s7.p0	@DRUG$ did not increase blood levels of high-dose @DRUG$.	DDI-false
204 | DDI-DrugBank.d765.s0.p0	CYP3A4 Inhibitors: @DRUG$, an inhibitor of the drug metabolizing enzyme CYP3A4, significantly increased plasma concentrations of @DRUG$ when coadministered to subjects who were poor metabolizers (see CLINICAL PHARMACOLOGY, Variability in Metabolism and Drug-Drug Interactions).	DDI-mechanism
205 | DDI-DrugBank.d766.s0.p0	Since @DRUG$ is metabolized by a P450 enzyme system, drugs that induce or inhibit this drug metabolizing enzyme system may elicit important drug-drug interactions that may alter @DRUG$ plasma concentrations.	DDI-false
206 | DDI-DrugBank.d769.s1.p0	Physicians should carefully monitor PT and INR in patients concurrently administered @DRUG$ and @DRUG$.	DDI-advise
207 | DDI-DrugBank.d770.s4.p0	however, in patients with Paget's Disease prior @DRUG$ use appears to reduce the anti-resorptive response to @DRUG$ nasal spray.	DDI-effect
208 | DDI-DrugBank.d775.s1.p0	Prior administration of @DRUG$ can potentiate the neuromuscular blocking effects of @DRUG$.	DDI-effect
209 | DDI-DrugBank.d775.s2.p0	Evidence of spontaneous recovery from @DRUG$ should be observed before the administration of @DRUG$.	DDI-advise
210 | DDI-DrugBank.d775.s4.p0	There are no clinical data on the use of @DRUG$ with other @DRUG$.	DDI-false
211 | DDI-DrugBank.d776.s17.p0	Depending on clinical circumstances, consideration should be given to increasing the dose of @DRUG$ when it is administered with @DRUG$.	DDI-advise
212 | DDI-DrugBank.d776.s23.p0	The combined use of @DRUG$ at doses of 400 mg or greater with @DRUG$ is contraindicated.	DDI-advise
213 | DDI-DrugBank.d776.s24.p0	The coadministration of @DRUG$ at doses lower than 400 mg/day with @DRUG$ should be carefully monitored.	DDI-advise
214 | DDI-DrugBank.d776.s27.p0	The combined use of @DRUG$ with @DRUG$ is contraindicated.	DDI-advise
215 | DDI-DrugBank.d776.s3.p0	one fatality has been reported from hypoglycemia in association with combined @DRUG$ and @DRUG$ use.	DDI-effect
216 | DDI-DrugBank.d776.s32.p0	Patients receiving @DRUG$ and @DRUG$ concomitantly should be carefully monitored.	DDI-advise
217 | DDI-DrugBank.d776.s34.p0	Patients receiving @DRUG$ and @DRUG$ concomitantly should be carefully monitored.	DDI-advise
218 | DDI-DrugBank.d776.s39.p0	however, in some patients there were decreases up to 47% and 33% of @DRUG$ and @DRUG$ levels.	DDI-false
219 | DDI-DrugBank.d776.s8.p0	Careful monitoring of prothrombin time in patients receiving @DRUG$ and @DRUG$ is recommended.	DDI-advise
220 | DDI-DrugBank.d777.s0.p0	The interaction of @DRUG$, @DRUG$, with other drugs has not been well studied.	DDI-false
221 | DDI-DrugBank.d777.s3.p0	In the treatment of acute MI, @DRUG$, when not otherwise contraindicated, should be administered with @DRUG$ ( see below ).	DDI-false
222 | DDI-DrugBank.d777.s4.p0	Anticoagulation and @DRUG$ After Treatment for Myocardial Infarction -- In the treatment of acute myocardial infarction, the use of @DRUG$ has been shown to reduce the incidence of reinfarction and stroke.	DDI-false
223 | DDI-DrugBank.d777.s5.p0	The addition of @DRUG$ to @DRUG$ causes a minimal increase in the risk of minor bleeding (3.9% vs. 3.1%), but does not appear to increase the incidence of major bleeding (see	DDI-effect
224 | DDI-DrugBank.d779.s0.p0	When @DRUG$ is used with other @DRUG$, potentiation of antihypertensive effect may occur.	DDI-effect
225 | DDI-DrugBank.d779.s2.p0	Patients may require reduced doses of @DRUG$ when on @DRUG$.	DDI-advise
226 | DDI-DrugBank.d781.s1.p0	The possibility of hypotensive effects can be minimized by either discontinuing the @DRUG$ or increasing the salt intake prior to initiation of treatment with @DRUG$.	DDI-effect
227 | DDI-DrugBank.d781.s10.p0	Use of a @DRUG$ may further increase the risk of @DRUG$ toxicity.	DDI-false
228 | DDI-DrugBank.d781.s15.p0	Food Interaction: Oral administration of @DRUG$    Tablets with food does not significantly lower the rate or extent of @DRUG$ absorption relative to the fasted state.	DDI-false
229 | DDI-DrugBank.d781.s16.p0	However, the extent of biotransformation of @DRUG$ to the active metabolite, @DRUG$, is reduced approximately 43%, resulting in a reduction in the plasma ACE inhibition curve of approximately 20%, probably clinically insignificant.	DDI-false
230 | DDI-DrugBank.d781.s2.p0	If @DRUG$ cannot be interrupted, close medical supervision should be provided with the first dose of @DRUG$    Tablets, for at least two hours and until blood pressure has stabilized for another hour.	DDI-advise
231 | DDI-DrugBank.d781.s3.p0	The rate and extent of @DRUG$ absorption and elimination are not affected by concomitant @DRUG$.	DDI-false
232 | DDI-DrugBank.d781.s4.p0	The bioavailability of @DRUG$ was reduced by @DRUG$, however, and this was associated with a decrease in plasma ACE inhibition.	DDI-false
233 | DDI-DrugBank.d784.s0.p0	The concomitant use of @DRUG$ with other @DRUG$ or with other agents which produce dry mouth, constipation, somnolence (drowsiness), and/or other anticholinergic-like effects may increase the frequency and/or severity of such effects.	DDI-effect
234 | DDI-MedLine.d143.s0.p0	[Interaction between @DRUG$ and @DRUG$].	DDI-false
235 | DDI-MedLine.d143.s1.p0	The drug interaction between @DRUG$ and @DRUG$ has been the subject of much study in recent years.	DDI-false
236 | DDI-MedLine.d143.s2.p0	Contradictory results regarding the effect of @DRUG$ on platelet reactivity and on clinical outcome in @DRUG$-treated patients have been reported in literature.	DDI-false
237 | DDI-MedLine.d143.s4.p0	@DRUG$ has a much weaker effect on @DRUG$'s pharmacokinetics and on platelet reactivity during concomitant use.	DDI-mechanism
238 | DDI-MedLine.d143.s5.p0	The influence of the other @DRUG$ when used simultaneously with @DRUG$ has not yet been investigated in adequately randomized studies.	DDI-false
239 | DDI-MedLine.d143.s7.p0	To date, there is no conclusive evidence of a clinically-relevant interaction between any of the @DRUG$ and @DRUG$.	DDI-false
240 | DDI-MedLine.d147.s0.p0	Interaction of @DRUG$ and @DRUG$, in vitro and in vivo, in human A375 melanoma cells.	DDI-false
241 | DDI-MedLine.d147.s2.p0	The effect of @DRUG$ and @DRUG$, alone and in combination, was evaluated for growth inhibition (MTT), radiolabeled drug uptake, cellular thiol content (HPLC), and DNA strand breaks (Comet assay).	DDI-false
242 | DDI-MedLine.d147.s6.p0	There was a >75% reduction in cellular glutathione and cysteine with @DRUG$ but not @DRUG$.	DDI-false
243 | DDI-MedLine.d147.s8.p0	@DRUG$ and @DRUG$ show additive effects in vitro but not in vivo in human A375 melanoma cells.	DDI-effect
244 | DDI-MedLine.d154.s0.p0	Enhancement of humoral immune responses to inactivated Newcastle disease and @DRUG$ by oral administration of @DRUG$ in chickens.	DDI-false
245 | DDI-MedLine.d154.s3.p0	In experiment 1, oral administration of @DRUG$ at a dose of 5 mg/kg of BW for 7 d on the immune response in chickens intramuscularly injected with @DRUG$ was evaluated.	DDI-false
246 | DDI-MedLine.d154.s5.p0	In experiment 2, the same regimen of @DRUG$ was administered to chickens inoculated with @DRUG$, and an enhanced serum antibody response to AI vaccination was also observed.	DDI-effect
247 | DDI-MedLine.d154.s6.p0	Considering the safety of @DRUG$, because no adverse effect was found throughout the experiments, @DRUG$ may be a promising oral adjuvant to improve immunization in poultry.	DDI-false
248 | DDI-MedLine.d159.s0.p0	Protective effect of @DRUG$ and  @DRUG$ against the acute toxicity of diepoxybutane to human lymphocytes.	DDI-false
249 | DDI-MedLine.d159.s6.p0	(ii) @DRUG$ elicits a significant protective effect on DEB induced toxicity, which was potentiated by  @DRUG$.	DDI-effect
250 | DDI-MedLine.d161.s3.p0	In the present study, we tested whether the highly specific metabotropic glutamate receptor 5 antagonist, @DRUG$, reduces dopaminergic and noradrenergic neuronal loss in monkeys rendered parkinsonian by chronic treatment with low doses of @DRUG$.	DDI-false
251 | DDI-MedLine.d161.s7.p0	The @DRUG$ treatment also had a significant effect on the @DRUG$-induced loss of norepinephrine neurons in the locus coeruleus and adjoining A5 and A7 noradrenaline cell groups.	DDI-effect
252 | DDI-MedLine.d161.s9.p0	Our data demonstrate that chronic treatment with the metabotropic glutamate receptor 5 antagonist, @DRUG$, significantly reduces @DRUG$ toxicity towards dopaminergic and noradrenergic cell groups in non-human primates.	DDI-effect
253 | DDI-MedLine.d163.s4.p0	The susceptibility results for @DRUG$ and @DRUG$ were interpreted according to the most relevant criteria.	DDI-false
254 | DDI-MedLine.d163.s7.p0	The combination of @DRUG$ and @DRUG$ can be synergistic against MRSA.	DDI-effect
255 | DDI-MedLine.d164.s0.p0	@DRUG$ enhances @DRUG$-mediated antibody-dependent cellular cytotoxicity by up-regulation of cell surface HER2 expression.	DDI-effect
256 | DDI-MedLine.d164.s3.p0	In an in vitro assay, @DRUG$ induced HER2 expression at the cell surface of HER2-positive breast cancer cell lines, leading to the enhancement of @DRUG$-mediated ADCC.	DDI-effect
257 | DDI-MedLine.d164.s4.p0	Furthermore, we present a case report in which a second @DRUG$ treatment following @DRUG$ resulted in the marked shrinkage of multiple metastatic tumors in HER2-positive breast cancer.	DDI-effect
258 | DDI-MedLine.d166.s0.p0	Improved parathyroid hormone control by @DRUG$ is associated with reduction in @DRUG$ requirement in patients with end-stage renal disease.	DDI-false
259 | DDI-MedLine.d166.s1.p0	Uncontrolled hy-per-parathyroidism causes bone marrow fibrosis, leading to @DRUG$ (@DRUG$) resistance.	DDI-false
260 | DDI-MedLine.d166.s17.p0	Reduction of PTH by @DRUG$ is associated with a decrease in @DRUG$ requirement.	DDI-effect
261 | DDI-MedLine.d166.s2.p0	Medical treatment with @DRUG$ is effective in reducing plasma parathyroid hormone (PTH) levels, but its effect on @DRUG$ dosing is unknown.	DDI-false
262 | DDI-MedLine.d166.s6.p0	The primary objective of the study was to ascertain the difference in @DRUG$ responsiveness before and after 12 months of @DRUG$ therapy.	DDI-false
263 | DDI-MedLine.d169.s6.p0	We also evaluated the effects of @DRUG$ doses between 200 and 800 mg on absorption of 5 mg @DRUG$ [as concentrated RBC (CRBC)].	DDI-false
264 | DDI-MedLine.d169.s8.p0	@DRUG$ doses   1000 mg diminished @DRUG$ absorption by an average of 49.6%.	DDI-mechanism
265 | DDI-MedLine.d169.s9.p0	A @DRUG$ dose of 800 mg diminished absorption of 5 mg @DRUG$ by 37.7%.	DDI-mechanism
266 | DDI-MedLine.d179.s0.p0	Synergistic interaction between @DRUG$ and @DRUG$ is sequence dependent in human non-small lung cancer with EGFR TKIs-resistant mutation.	DDI-int
267 | DDI-MedLine.d179.s3.p0	NSCLC cell lines with EGFR T790M mutation and K-ras mutation were exposed to either @DRUG$ or @DRUG$ or both based on various sequential administrations.	DDI-false
268 | DDI-MedLine.d179.s5.p0	@DRUG$ exhibited dose-dependent growth inhibition in NSCLC cell lines and arrested cell cycle at G1 phase, whereas @DRUG$ arrested at S phase.	DDI-false
269 | DDI-MedLine.d179.s6.p0	Although single or concurrent use of @DRUG$ and @DRUG$ has some anti-proliferative effects, the sequential administrations of both drugs remarkably enhanced anti-tumor activity.	DDI-effect
270 | DDI-MedLine.d179.s7.p0	When cells were exposed to @DRUG$ followed by @DRUG$, synergism was observed.	DDI-effect
271 | DDI-MedLine.d179.s8.p0	The molecular basis of this synergism is that the signaling pathways that were initially activated by @DRUG$ exposure were efficiently suppressed by the subsequent exposure to @DRUG$.	DDI-effect
272 | DDI-MedLine.d180.s1.p0	The @DRUG$ (@DRUG$), one of S-adenosylhomocysteine (AdoHcy) hydrolase inhibitors, has shown antitumor activities in a broad range of solid tumors and acute myeloid leukemia.	DDI-false
273 | DDI-MedLine.d181.s4.p0	The authors report the case of an infant with confirmed congenital hypothyroidism on @DRUG$ who experienced a possible drug interaction with @DRUG$.	DDI-int
274 | DDI-MedLine.d181.s6.p0	Questioning revealed the child was taking @DRUG$ drops before feeds while on @DRUG$.	DDI-false
275 | DDI-MedLine.d181.s8.p0	Drug interaction of @DRUG$ with @DRUG$ has not been reported previously and is not listed in the British National Formulary for Children.	DDI-false
276 | DDI-MedLine.d184.s0.p0	@DRUG$ locomotor sensitization and conditioned place preference in adolescent male and female rats neonatally treated with @DRUG$.	DDI-false
277 | DDI-MedLine.d184.s2.p0	The objective was to analyze the effects of neonatal @DRUG$ treatment on effects of @DRUG$ in adolescent rats using locomotor sensitization and conditioned place preference procedures.	DDI-false
278 | DDI-MedLine.d184.s6.p0	Male rats demonstrated sensitization to @DRUG$, although this was muted compared with female rats, and were unaffected by neonatal @DRUG$.	DDI-false
279 | DDI-MedLine.d185.s12.p0	Co-administration of @DRUG$ with CYP3A inhibitors is feasible as the observed increase in @DRUG$ PK parameters was not considered clinically relevant.	DDI-false
280 | DDI-MedLine.d185.s13.p0	Considering the variability in exposure following enzyme inhibition and the fact that chronic dosing of @DRUG$ was not studied with CYP3A inhibitors, close monitoring of @DRUG$-related adverse events is necessary.	DDI-false
281 | DDI-MedLine.d185.s2.p0	This study evaluated the effect of a potent CYP3A inhibitor, @DRUG$, on the pharmacokinetics and safety of @DRUG$.	DDI-false
282 | DDI-MedLine.d185.s5.p0	On day 8, a single @DRUG$ dose was co-administered with @DRUG$.	DDI-false
283 | DDI-MedLine.d185.s7.p0	In the presence of @DRUG$, there was 1.6- and 1.8-fold increase in C (max) and AUC of @DRUG$, respectively.	DDI-mechanism
284 | DDI-MedLine.d189.s11.p0	These results suggest that rat liver slices do not reproduce the reported in vivo biotransformation of @DRUG$ and therefore may not be an appropriate model for @DRUG$ metabolism.	DDI-false
285 | DDI-MedLine.d189.s3.p0	The incubations were performed in the absence and presence of the non-specific CYP inhibitor, @DRUG$ (@DRUG$) and isoform-specific inhibitors.	DDI-false
286 | DDI-MedLine.d189.s7.p0	@DRUG$ decreased the toxicity of @DRUG$, increased exposure to parent compound, and decreased metabolite levels in a dose-dependent manner.	DDI-effect
287 | DDI-MedLine.d189.s8.p0	Of the isoform-specific CYP inhibitors tested for an effect on the @DRUG$ metabolite profile, only @DRUG$ was noticeably effective, indicating a role of CYPs 2A6, 2C9, 2Cl9, and 2E1.	DDI-false
288 | DDI-MedLine.d190.s1.p0	To determine whether the co-injection of extracellular matrix degrading enzymes improves retinal transduction following intravitreal delivery of @DRUG$ (@DRUG$).	DDI-false
289 | DDI-MedLine.d190.s5.p0	The addition of @DRUG$ or @DRUG$ greatly enhanced transduction of the retinal ganglion cell layer and increased the depth of transduction into the outer retina.	DDI-false
290 | DDI-MedLine.d190.s6.p0	@DRUG$ had a limited effect and @DRUG$ was ineffective.	DDI-false
291 | DDI-MedLine.d190.s7.p0	Electroretinograms survived with higher concentrations of @DRUG$ and @DRUG$ than were required for optimal retinal transduction.	DDI-false
292 | DDI-MedLine.d194.s0.p0	The role of p27(Kip1) in @DRUG$-enhanced @DRUG$ cytotoxicity in human ovarian cancer cells.	DDI-effect
293 | DDI-MedLine.d194.s11.p0	@DRUG$ + @DRUG$ vs.	DDI-false
294 | DDI-MedLine.d194.s15.p0	The siRNA knockdown of p27(Kip1) decreased @DRUG$- and @DRUG$-induced apoptosis compared with a negative control siRNA (sub-G1 fraction, control siRNA vs. p27(Kip1) siRNA: 42.5% vs. 20.1%, difference = 22.4%, 95% CI = 20.1% to 24.7%, P = .017).	DDI-false
295 | DDI-MedLine.d194.s16.p0	Studies with forced expression and siRNA knockdown of Bcl-2 and Cdk1 suggest that @DRUG$-mediated induction of p27(Kip1) enhanced @DRUG$-induced apoptosis by negatively regulating Bcl-2 and Cdk1 expression.	DDI-effect
296 | DDI-MedLine.d194.s17.p0	Inhibition of Src family and Abl kinases with either siRNAs or @DRUG$ enhances @DRUG$ sensitivity of ovarian cancer cells through p27(Kip1)-mediated suppression of Bcl-2 and Cdk1 expression.	DDI-effect
297 | DDI-MedLine.d194.s4.p0	The effect of @DRUG$, an inhibitor of Src and Abl kinases, on @DRUG$ sensitivity was measured in ovarian cancer cells and HEY xenografts.	DDI-false
298 | DDI-MedLine.d194.s5.p0	The roles of p27(Kip1), Bcl-2, and Cdk1 in apoptosis induced by @DRUG$ and @DRUG$ were assessed using a terminal deoxynucleotidyl transferase-mediated dUTP nick-end labeling (TUNEL) assay, siRNA knockdown of gene expression, transfection with Bcl-2 and Cdk1 expression vectors, and flow cytometry.	DDI-false
299 | DDI-MedLine.d194.s9.p0	HEY cells treated with @DRUG$ plus @DRUG$ formed fewer colonies than did cells treated with either agent alone.	DDI-effect
300 | DDI-MedLine.d195.s0.p0	Influence of @DRUG$ on @DRUG$ induced antinociception and its pharmacokinetics.	DDI-false
301 | DDI-MedLine.d195.s5.p0	@DRUG$ plasma concentration was also increased when it was administered with @DRUG$.	DDI-mechanism
302 | DDI-MedLine.d195.s7.p0	From this study it can be concluded that @DRUG$ can be used as a bioenhancer along with @DRUG$.	DDI-effect
303 | DDI-MedLine.d199.s0.p0	Prevention of emergence agitation in seven children receiving low-dose @DRUG$ and @DRUG$ total intravenous anesthesia.	DDI-false
304 | DDI-MedLine.d199.s2.p0	We retrospectively reviewed the records of 7 pediatric oncology patients who received low-dose @DRUG$ in conjunction with @DRUG$ for total intravenous anesthesia (TIVA) repeatedly for radiation therapy.	DDI-false
305 | DDI-MedLine.d200.s0.p0	[Efficacy of fixed combination @DRUG$/@DRUG$ in hospitalized patients with hypertensive disease]	DDI-false
306 | DDI-MedLine.d200.s1.p0	Efficacy and tolerability of fixed @DRUG$/@DRUG$ combination was studied in 86 patients with hypertensive disease hospitalized in departments of general internal medicine or cardiology.	DDI-false
307 | DDI-MedLine.d200.s2.p0	All patients had indications for antihypertensive therapy and were randomized either to fixed combination @DRUG$/@DRUG$ (n=43) or to therapy which corresponded to the hospital formulary (n=43).	DDI-false
308 | DDI-MedLine.d200.s7.p0	Rate of achievement of target BP with fixed combination @DRUG$/@DRUG$ (93%) was comparable with that on traditional therapy (90%).	DDI-false
309 | DDI-MedLine.d201.s4.p0	The aim of the present study was to investigate whether HB-EGF is a therapeutic target for T-ALL, and to further elucidate the antitumor effects of a specific inhibitor of HB-EGF, @DRUG$ (@DRUG$).	DDI-false
310 | DDI-MedLine.d201.s5.p0	We elucidated the expression of HB-EGF in T-ALL cell lines, and evaluated the effect of @DRUG$ on these cells alone or in combination with @DRUG$.	DDI-false
311 | DDI-MedLine.d204.s0.p0	Distinct synergistic action of @DRUG$ and @DRUG$ against Pseudomonas aeruginosa.	DDI-false
312 | DDI-MedLine.d204.s9.p0	Distinct and statistically significant synergism was observed between @DRUG$ and @DRUG$ by disc diffusion tests when compared with their individual effects.	DDI-effect
313 | DDI-MedLine.d208.s0.p0	@DRUG$   failure in an HIV-positive woman on @DRUG$ therapy resulting in two ectopic pregnancies.	DDI-effect
314 | DDI-MedLine.d208.s3.p0	We present an interesting case of an HIV-positive woman on @DRUG$ therapy having tubal pregnancies on two separate occasions with @DRUG$ in place.	DDI-effect
315 | DDI-MedLine.d209.s1.p0	@DRUG$, an @DRUG$, is the standard hormone treatment for breast cancer.	DDI-false
316 | DDI-MedLine.d209.s10.p0	Depression does not always require @DRUG$ therapy, and @DRUG$ have no proven preventive impact on hot flushes linked to the menopause.	DDI-false
317 | DDI-MedLine.d211.s2.p0	The preclinical combination of @DRUG$ with the mTOR inhibitor @DRUG$ has displayed synergy in vitro and represents a novel combination in MM.	DDI-effect
318 | DDI-MedLine.d213.s0.p0	In vivo CYP3A activity is significantly lower in @DRUG$-treated as compared with @DRUG$-treated renal allograft recipients.	DDI-false
319 | DDI-MedLine.d213.s1.p0	In vitro studies have identified @DRUG$ and @DRUG$ as CYP3A inhibitors.	DDI-false
320 | DDI-MedLine.d214.s0.p0	[Influence of @DRUG$ and @DRUG$ on MPTP-evoked behavior violations in C57BL/6 mice].	DDI-false
321 | DDI-MedLine.d214.s5.p0	Acute administration of @DRUG$ or @DRUG$ failed to influence locomotion in mice, while their combination normalized motor activity.	DDI-effect
322 | DDI-MedLine.d216.s5.p0	In the @DRUG$ and @DRUG$ phases, the areas under the effect-time curves (self-reported drowsiness and performance) were significantly higher than those in the placebo phase (P < 0.05).	DDI-false
323 | DDI-MedLine.d216.s6.p0	The findings suggest that the dosage of @DRUG$ should be reduced in patients receiving @DRUG$.	DDI-advise
324 | DDI-MedLine.d217.s0.p0	Interaction of @DRUG$ with different @DRUG$ is antagonistic in breast but not in other cancer cells.	DDI-effect
325 | DDI-MedLine.d217.s11.p0	These results, if confirmed in-vivo, indicate that @DRUG$ is not a suitable chemosensitizer for breast cancer or with @DRUG$ for other cancers.	DDI-false
326 | DDI-MedLine.d217.s2.p0	This study investigates the ability of @DRUG$ to sensitize cells from different origins to several @DRUG$.	DDI-false
327 | DDI-MedLine.d218.s0.p0	@DRUG$-@DRUG$ interaction and associated gastrointestinal bleeding risk in a case-control study.	DDI-false
328 | DDI-MedLine.d218.s3.p0	We sought to evaluate whether initiation of an @DRUG$ increases the risk of hospitalization for gastrointestinal bleeding in @DRUG$ users.	DDI-false
329 | DDI-MedLine.d218.s5.p0	In total, 430,455 @DRUG$ users contributed 407,370 person-years of @DRUG$ use.	DDI-false
330 | DDI-MedLine.d218.s9.p0	Also @DRUG$, which is not believed to interact with @DRUG$, increased the risk of GI bleeding (OR   =   1.75 [95% CI, 1.30-2.35]).	DDI-effect
331 | DDI-MedLine.d227.s0.p0	@DRUG$ enhances the antinociceptive effect of @DRUG$ in mice.	DDI-effect
332 | DDI-MedLine.d227.s2.p0	The purpose of this study was to evaluate whether the psychostimulant drug @DRUG$ has any effect on @DRUG$-induced antinociception and locomotor inhibitor activity in mice in acute application.	DDI-false
333 | DDI-MedLine.d227.s8.p0	At the same time, the locomotor inhibitory effect of @DRUG$ was counteracted by @DRUG$.	DDI-effect
334 | DDI-MedLine.d231.s0.p0	Interaction study of @DRUG$ and @DRUG$ with co-administered drugs.	DDI-false
335 | DDI-MedLine.d231.s6.p0	The response of @DRUG$ and @DRUG$ after interaction with co-administered drugs at different conditions and temperature were noted using a Shimadzu HPLC system with PDA detector.	DDI-false
336 | DrugDDI.d21928724.s8.p0	@DRUG$ can be a useful @DRUG$ especially in cases which respond to drugs having mixed effects on serotonin and catecholamines levels in the brain.	DDI-false
337 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge
  2 | 
  3 | 
  4 | ## Paper ##
  5 | [Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge](https://arxiv.org/abs/1911.09487)
  6 | 
  7 | ## Model structure ##
  8 | <img src="model.jpg" width="800" >
  9 | 
 10 | 
 11 | ## Codes ##
 12 | In this repository, we provide the codes of our proposed model. \
 13 | 'pytorch_model.bin' in the Resources and Records can be obtained from [pytorch_models](https://drive.google.com/drive/folders/15o_h-_YQUgccvc9202hTGrSyZfrzOPFX?usp=sharing).
 14 | ```
 15 | CPI Extraction
 16 |   -Resources
 17 |     -NCBI_BERT_pubmed_uncased_L-12_H-768_A-12
 18 |       -vocab.txt
 19 |       -pytorch_model.bin
 20 |       -bert_config.json
 21 |   -Records
 22 |     -record_76.56%
 23 |       -pytorch_model.bin
 24 |       -bert_config.json
 25 |       -eval_results.txt
 26 |       -test_results.txt
 27 |   -SourceCode
 28 |     -BioRE.py
 29 |     -BioRE_BG.py
 30 |     -modeling.py
 31 |     -modeling_BG.py
 32 |     -file_utils.py
 33 |   -ProcessedData
 34 |     -CHEMPROT
 35 |       -train.tsv
 36 |       -dev.tsv
 37 |       -test.tsv
 38 |       -test_overlapping.tsv
 39 |       -test_normal.tsv
 40 |     -DDIExtraction2013
 41 |       -train.tsv
 42 |       -dev.tsv
 43 |       -test.tsv
 44 |       -test_overlapping.tsv
 45 |       -test_normal.tsv
 46 | ```
 47 | 
 48 | * Resources: provide the BERT model pre-trained on PubMed.
 49 | * Records: provide a record of our proposed model.
 50 | * SourceCode: provide the source codes.  BioRE.py performs our proposed model; and BioRE.py performs the 'BERT+Gaussian' model.
 51 | * ProcessedData: provide the CHEMPROT and DDIExtraction2013 datasets, including the overlapping and normal instances.
 52 | 
 53 | 
 54 | 
 55 | ## Tested environments ##
 56 | 
 57 | * Ubuntu                    16.04
 58 | * python                    3.6.9
 59 | * pytorch-pretrained-bert   0.6.1
 60 | * torch                     1.1.0
 61 | * numpy                     1.16.4
 62 | * pandas                    0.25.0
 63 | * scipy                     1.3.1
 64 | 
 65 | 
 66 | ## Run models ##
 67 | Since the model contains multiple layer, it generally need some time to train. If the users have no time to train model, the saved model in the Records can be loaded to test. \
 68 | Some examples of execution instructions are listed below.
 69 | 
 70 | 
 71 | #### Run our proposed model ####
 72 | ```
 73 | python BioRE.py \
 74 |   --task_name cpi \
 75 |   --do_train \
 76 |   --do_eval \
 77 |   --do_predict \
 78 |   --do_lower_case \
 79 |   --data_dir /$YourPath/CPI_extraction/ProcessedData/CHEMPROT \
 80 |   --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \
 81 |   --max_seq_length 128 \
 82 |   --train_batch_size 16 \
 83 |   --eval_batch_size 8 \
 84 |   --predict_batch_size 8 \
 85 |   --learning_rate 2e-5 \
 86 |   --num_train_epochs 2.0 \
 87 |   --seed 47 \
 88 |   --output_dir /$YourOutputPath
 89 | ```
 90 | #### Load the record of our proposed model ####
 91 | ```
 92 | python BioRE.py \
 93 |   --task_name cpi \
 94 |   --do_eval \
 95 |   --do_predict \
 96 |   --do_lower_case \
 97 |   --data_dir /$YourPath/CPI_extraction/ProcessedData/CHEMPROT \
 98 |   --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \
 99 |   --saved_model /$YourSavedmodelPath \
100 |   --max_seq_length 128 \
101 |   --train_batch_size 16 \
102 |   --eval_batch_size 8 \
103 |   --predict_batch_size 8 \
104 |   --learning_rate 2e-5 \
105 |   --num_train_epochs 2.0 \
106 |   --output_dir /$YourOutputPath
107 | ```
108 | #### Run the 'BERT+Gaussian' model on the CHEMPROT dataset ####
109 | ```
110 | python BioRE_BG.py \
111 |   --task_name cpi \
112 |   --do_train \
113 |   --do_eval \
114 |   --do_predict \
115 |   --do_lower_case \
116 |   --data_dir /$YourPath/CPI_extraction/ProcessedData/CHEMPROT \
117 |   --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \
118 |   --max_seq_length 128 \
119 |   --train_batch_size 16 \
120 |   --eval_batch_size 8 \
121 |   --predict_batch_size 8 \
122 |   --learning_rate 2e-5 \
123 |   --num_train_epochs 2.0 \
124 |   --seed 47 \
125 |   --output_dir /$YourOutputPath
126 | ```
127 | #### Run the 'BERT+Gaussian' model on the DDIExtraction dataset ####
128 | ```
129 | python BioRE_BG.py \
130 |   --task_name ddi \
131 |   --do_train \
132 |   --do_eval \
133 |   --do_predict \
134 |   --do_lower_case \
135 |   --data_dir /$YourPath/CPI_extraction/ProcessedData/DDIExtraction2013 \
136 |   --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \
137 |   --max_seq_length 128 \
138 |   --train_batch_size 16 \
139 |   --eval_batch_size 8 \
140 |   --predict_batch_size 8 \
141 |   --learning_rate 2e-5 \
142 |   --num_train_epochs 3.0 \
143 |   --seed 17 \
144 |   --output_dir /$YourOutputPath
145 | ```
146 | 
147 | ## Cite ##
148 | 
149 | ```
150 | @article{10.1093/bioinformatics/btaa491,
151 |     author = {Sun, Cong and Yang, Zhihao and Su, Leilei and Wang, Lei and Zhang, Yin and Lin, Hongfei and Wang, Jian},
152 |     title = "{Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge}",
153 |     journal = {Bioinformatics},
154 |     year = {2020},
155 |     month = {05},
156 |     issn = {1367-4803},
157 |     doi = {10.1093/bioinformatics/btaa491},
158 |     url = {https://doi.org/10.1093/bioinformatics/btaa491},
159 |     note = {btaa491},
160 | }
161 | ```
162 | 
163 | 


--------------------------------------------------------------------------------
/Records/record_76.56%/bert_config.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "attention_probs_dropout_prob": 0.1,
 3 |   "hidden_act": "gelu",
 4 |   "hidden_dropout_prob": 0.1,
 5 |   "hidden_size": 768,
 6 |   "initializer_range": 0.02,
 7 |   "intermediate_size": 3072,
 8 |   "max_position_embeddings": 512,
 9 |   "num_attention_heads": 12,
10 |   "num_hidden_layers": 12,
11 |   "type_vocab_size": 2,
12 |   "vocab_size": 30522
13 | }
14 | 


--------------------------------------------------------------------------------
/Records/record_76.56%/eval_results.txt:
--------------------------------------------------------------------------------
1 | F1 = 0.7709061023217587
2 | P = 0.7654018767849857
3 | R = 0.7764900662251656
4 | eval_loss = 0.29068153011782727
5 | global_step = 2434
6 | loss = 0.13145572502983333
7 | 


--------------------------------------------------------------------------------
/Records/record_76.56%/test_results.txt:
--------------------------------------------------------------------------------
1 | F1 = 0.7656477438136826
2 | P = 0.7708089097303634
3 | R = 0.7605552342394447
4 | global_step = 2434
5 | loss = 0.13145572502983333
6 | test_loss = 0.3190204159669293
7 | 


--------------------------------------------------------------------------------
/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12/bert_config.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "attention_probs_dropout_prob": 0.1,
 3 |   "hidden_act": "gelu",
 4 |   "hidden_dropout_prob": 0.1,
 5 |   "hidden_size": 768,
 6 |   "initializer_range": 0.02,
 7 |   "intermediate_size": 3072,
 8 |   "max_position_embeddings": 512,
 9 |   "num_attention_heads": 12,
10 |   "num_hidden_layers": 12,
11 |   "type_vocab_size": 2,
12 |   "vocab_size": 30522
13 | }
14 | 


--------------------------------------------------------------------------------
/SourceCode/BioRE.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # Author: Cong Sun
  3 | # Email:  suncong132@mail.dlut.edu.cn
  4 | #
  5 | # BioRE
  6 | #
  7 | #--------------------------------------------------------------------------
  8 | 
  9 | 
 10 | 
 11 | from __future__ import absolute_import, division, print_function
 12 | import argparse
 13 | import csv
 14 | import logging
 15 | import os
 16 | import random
 17 | import sys
 18 | import numpy as np
 19 | import torch
 20 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,TensorDataset)
 21 | from torch.utils.data.distributed import DistributedSampler
 22 | from tqdm import tqdm, trange
 23 | from torch.nn import CrossEntropyLoss, MSELoss
 24 | import scipy
 25 | from scipy.stats import pearsonr, spearmanr
 26 | from sklearn.metrics import matthews_corrcoef, f1_score, precision_recall_fscore_support
 27 | 
 28 | import modeling
 29 | from modeling import BertForSequenceClassification, BertConfig, WEIGHTS_NAME, CONFIG_NAME
 30 | from pytorch_pretrained_bert.file_utils import PYTORCH_PRETRAINED_BERT_CACHE
 31 | from pytorch_pretrained_bert.tokenization import BertTokenizer
 32 | from pytorch_pretrained_bert.optimization import BertAdam, warmup_linear
 33 | import os
 34 | 
 35 | logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
 36 |                     datefmt = '%m/%d/%Y %H:%M:%S',
 37 |                     level = logging.INFO)
 38 | logger = logging.getLogger(__name__)
 39 | 
 40 | sigma = 2.5
 41 | mu = 0.0
 42 | 
 43 | norm_0=scipy.stats.norm(mu,sigma).cdf(1)-scipy.stats.norm(mu,sigma).cdf(0)
 44 | norm_1=scipy.stats.norm(mu,sigma).cdf(2)-scipy.stats.norm(mu,sigma).cdf(1)
 45 | norm_2=scipy.stats.norm(mu,sigma).cdf(3)-scipy.stats.norm(mu,sigma).cdf(2)
 46 | norm_3=scipy.stats.norm(mu,sigma).cdf(4)-scipy.stats.norm(mu,sigma).cdf(3)
 47 | norm_4=scipy.stats.norm(mu,sigma).cdf(5)-scipy.stats.norm(mu,sigma).cdf(4)
 48 | norm_5=scipy.stats.norm(mu,sigma).cdf(6)-scipy.stats.norm(mu,sigma).cdf(5)
 49 | norm_6=scipy.stats.norm(mu,sigma).cdf(7)-scipy.stats.norm(mu,sigma).cdf(6)
 50 | 
 51 | 
 52 | 
 53 | class InputExample(object):
 54 |     """A single training/test example for simple sequence classification."""
 55 | 
 56 |     def __init__(self, guid, title, text_a, text_b=None, sdp=None, label=None):
 57 |         """Constructs a InputExample."""
 58 |         self.guid = guid
 59 |         self.title = title
 60 |         self.text_a = text_a
 61 |         self.text_b = text_b
 62 |         self.sdp = sdp
 63 |         self.label = label
 64 | 
 65 | 
 66 | class InputFeatures(object):
 67 |     """A single set of features of data."""
 68 | 
 69 |     def __init__(self, title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, \
 70 |                  sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list, label_id):
 71 |         
 72 |         self.title_ids=title_ids
 73 |         self.title_mask=title_mask
 74 |         self.title_segment = title_segment
 75 |         
 76 |         self.input_ids = input_ids
 77 |         self.input_mask = input_mask
 78 |         self.segment_ids = segment_ids
 79 |         
 80 |         self.sdp_ids = sdp_ids
 81 |         self.sdp_mask = sdp_mask
 82 |         self.sdp_segment = sdp_segment
 83 |         
 84 |         self.P_gauss1_list = P_gauss1_list
 85 |         self.P_gauss2_list = P_gauss2_list
 86 |         self.label_id = label_id
 87 | 
 88 | 
 89 | class DataProcessor(object):
 90 |     """Base class for data converters for sequence classification data sets."""
 91 | 
 92 |     def get_train_examples(self, data_dir):
 93 |         """Gets a collection of `InputExample`s for the train set."""
 94 |         raise NotImplementedError()
 95 | 
 96 |     def get_dev_examples(self, data_dir):
 97 |         """Gets a collection of `InputExample`s for the dev set."""
 98 |         raise NotImplementedError()
 99 | 
100 |     def get_labels(self):
101 |         """Gets the list of labels for this data set."""
102 |         raise NotImplementedError()
103 | 
104 |     @classmethod
105 |     def _read_tsv(cls, input_file, quotechar=None):
106 |         """Reads a tab separated value file."""
107 |         with open(input_file, "r") as f:
108 |             reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
109 |             lines = []
110 |             for line in reader:
111 |                 if sys.version_info[0] == 2:
112 |                     line = list(unicode(cell, 'utf-8') for cell in line)
113 |                 lines.append(line)
114 |             return lines
115 | 
116 | 
117 | class BioBERTChemprotProcessor(DataProcessor):
118 |     """Processor for the BioBERT data set (GLUE version)."""
119 | 
120 |     def get_train_examples(self, data_dir):
121 |         """See base class."""
122 |         return self._create_examples(
123 |             self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
124 | 
125 |     def get_dev_examples(self, data_dir):
126 |         """See base class."""
127 |         return self._create_examples(
128 |             self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
129 | 
130 |     def get_test_examples(self, data_dir):
131 |         """See base class."""
132 |         return self._create_examples(
133 |             self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
134 | 
135 | 
136 |     def get_labels(self):
137 |         """See base class."""
138 |         return ["CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9", "false"]
139 | 
140 |     def _create_examples(self, lines, set_type):
141 |         """Creates examples for the training and dev sets."""
142 |         examples = []
143 |         for (i, line) in enumerate(lines):
144 |             if set_type == "train" and i == 0:
145 |                 continue
146 |             if set_type == "dev" and i == 0:
147 |                 continue
148 |             if set_type == "test" and i == 0:
149 |                 continue
150 |             guid = "%s-%s" % (set_type, i)
151 |             title = line[1]
152 |             text_a = line[2]
153 |             sdp = line[3]
154 |             label = line[4]
155 |             examples.append(
156 |                 InputExample(guid=guid, title=title, text_a=text_a, text_b=None, sdp=sdp, label=label))
157 |         return examples
158 |         
159 | 
160 | 
161 | def convert_examples_to_features(examples, label_list, max_seq_length,
162 |                                  tokenizer, output_mode):
163 |     """Loads a data file into a list of `InputBatch`s."""
164 | 
165 |     label_map = {label : i for i, label in enumerate(label_list)}
166 | 
167 |     features = []
168 |     for (ex_index, example) in enumerate(examples):
169 |         if ex_index % 10000 == 0:
170 |             logger.info("Writing example %d of %d" % (ex_index, len(examples)))
171 |             
172 |         title = tokenizer.tokenize(example.title)
173 |         if len(title) > max_seq_length - 2:
174 |             title = title[:(max_seq_length - 2)]
175 | 
176 |         tokens_a = tokenizer.tokenize(example.text_a)
177 | 
178 |         tokens_b = None
179 |         if example.text_b:
180 |             tokens_b = tokenizer.tokenize(example.text_b)
181 |             # Modifies `tokens_a` and `tokens_b` in place so that the total
182 |             # length is less than the specified length.
183 |             # Account for [CLS], [SEP], [SEP] with "- 3"
184 |             _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
185 |         else:
186 |             # Account for [CLS] and [SEP] with "- 2"
187 |             if len(tokens_a) > max_seq_length - 2:
188 |                 tokens_a = tokens_a[:(max_seq_length - 2)]
189 |                 
190 |         sdp = tokenizer.tokenize(example.sdp)
191 |         if len(sdp) > max_seq_length - 2:
192 |             sdp = sdp[:(max_seq_length - 2)]
193 | 
194 |         # The convention in BERT is:
195 |         # (a) For sequence pairs:
196 |         #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
197 |         #  type_ids: 0   0  0    0    0     0       0 0    1  1  1  1   1 1
198 |         # (b) For single sequences:
199 |         #  tokens:   [CLS] the dog is hairy . [SEP]
200 |         #  type_ids: 0   0   0   0  0     0 0
201 |         #
202 |         # Where "type_ids" are used to indicate whether this is the first
203 |         # sequence or the second sequence. The embedding vectors for `type=0` and
204 |         # `type=1` were learned during pre-training and are added to the wordpiece
205 |         # embedding vector (and position vector). This is not *strictly* necessary
206 |         # since the [SEP] token unambiguously separates the sequences, but it makes
207 |         # it easier for the model to learn the concept of sequences.
208 |         #
209 |         # For classification tasks, the first vector (corresponding to [CLS]) is
210 |         # used as as the "sentence vector". Note that this only makes sense because
211 |         # the entire model is fine-tuned.
212 |         
213 |         title = ["[CLS]"] + title + ["[SEP]"] 
214 |         tokens = ["[CLS]"] + tokens_a + ["[SEP]"]
215 |         sdp = ["[CLS]"] + sdp + ["[SEP]"]
216 |         
217 |         title_segment = [0] * len(title)
218 |         segment_ids = [0] * len(tokens)
219 |         sdp_segment = [0] * len(sdp)
220 | 
221 |         if tokens_b:
222 |             tokens += tokens_b + ["[SEP]"]
223 |             segment_ids += [1] * (len(tokens_b) + 1)
224 | 
225 |         if '@' not in tokens:
226 |             P_gauss1_array = np.zeros(max_seq_length)
227 |             P_gauss2_array = np.zeros(max_seq_length)
228 |         else:
229 |             ann1_index=tokens.index('@')
230 |             if '@' not in tokens[ann1_index+1:]:
231 |                 P_array = np.zeros(max_seq_length)
232 |                 P_array[ann1_index-1] = norm_1
233 |                 P_array[ann1_index] = norm_0
234 |                 P_array[ann1_index+1] = norm_0
235 |                 if ann1_index+2 < max_seq_length:
236 |                     P_array[ann1_index+2] = norm_1
237 |                 if ann1_index+3 < max_seq_length:
238 |                     P_array[ann1_index+3] = norm_2
239 |                 if ann1_index+4 < max_seq_length:
240 |                     P_array[ann1_index+4] = norm_3
241 |                 if ann1_index+5 < max_seq_length:
242 |                     P_array[ann1_index+5] = norm_4
243 |                 if ann1_index+6 < max_seq_length:
244 |                     P_array[ann1_index+6] = norm_5
245 |                 if ann1_index+7 < max_seq_length:
246 |                     P_array[ann1_index+7] = norm_6
247 |                 if ann1_index-2 >= 0:
248 |                     P_array[ann1_index-2] = norm_2
249 |                 if ann1_index-3 >= 0:
250 |                     P_array[ann1_index-3] = norm_3
251 |                 if ann1_index-4 >= 0:
252 |                     P_array[ann1_index-4] = norm_4
253 |                 if ann1_index-5 >= 0:
254 |                     P_array[ann1_index-5] = norm_5
255 |                 if ann1_index-6 >= 0:
256 |                     P_array[ann1_index-6] = norm_6
257 |                 P_gauss1_array = P_array
258 |                 P_gauss2_array = P_array
259 |             else:
260 |                 ann2_index=tokens.index('@',ann1_index+1)
261 |                 P1_array = np.zeros(max_seq_length)
262 |                 P2_array = np.zeros(max_seq_length)
263 |     
264 |                 P1_array[ann1_index-1] = norm_1
265 |                 P1_array[ann1_index] = norm_0
266 |                 P1_array[ann1_index+1] = norm_0
267 |                 if ann1_index+2 < max_seq_length:
268 |                     P1_array[ann1_index+2] = norm_1
269 |                 if ann1_index+3 < max_seq_length:
270 |                     P1_array[ann1_index+3] = norm_2
271 |                 if ann1_index+4 < max_seq_length:
272 |                     P1_array[ann1_index+4] = norm_3
273 |                 if ann1_index+5 < max_seq_length:
274 |                     P1_array[ann1_index+5] = norm_4
275 |                 if ann1_index+6 < max_seq_length:
276 |                     P1_array[ann1_index+6] = norm_5
277 |                 if ann1_index+7 < max_seq_length:
278 |                     P1_array[ann1_index+7] = norm_6
279 |                 if ann1_index-2 >=0:
280 |                     P1_array[ann1_index-2] = norm_2
281 |                 if ann1_index-3 >=0:
282 |                     P1_array[ann1_index-3] = norm_3
283 |                 if ann1_index-4 >=0:
284 |                     P1_array[ann1_index-4] = norm_4
285 |                 if ann1_index-5 >=0:
286 |                     P1_array[ann1_index-5] = norm_5
287 |                 if ann1_index-6 >=0:
288 |                     P1_array[ann1_index-6] = norm_6
289 |                     
290 |                 P2_array[ann2_index-1] = norm_1
291 |                 P2_array[ann2_index] = norm_0
292 |                 P2_array[ann2_index+1] = norm_0
293 |                 if ann2_index+2 < max_seq_length:
294 |                     P2_array[ann2_index+2] = norm_1
295 |                 if ann2_index+3 < max_seq_length:
296 |                     P2_array[ann2_index+3] = norm_2
297 |                 if ann2_index+4 < max_seq_length:
298 |                     P2_array[ann2_index+4] = norm_3
299 |                 if ann2_index+5 < max_seq_length:
300 |                     P2_array[ann2_index+5] = norm_4
301 |                 if ann2_index+6 < max_seq_length:
302 |                     P2_array[ann2_index+6] = norm_5
303 |                 if ann2_index+7 < max_seq_length:
304 |                     P2_array[ann2_index+7] = norm_6
305 |                 if ann2_index-2 >=0:
306 |                     P2_array[ann2_index-2] = norm_2
307 |                 if ann2_index-3 >= 0:
308 |                     P2_array[ann2_index-3] = norm_3
309 |                 if ann2_index-4 >= 0:
310 |                     P2_array[ann2_index-4] = norm_4
311 |                 if ann2_index-5 >= 0:
312 |                     P2_array[ann2_index-5] = norm_5
313 |                 if ann2_index-6 >= 0:
314 |                     P2_array[ann2_index-6] = norm_6
315 |                 P_gauss1_array = P1_array 
316 |                 P_gauss2_array = P2_array
317 | 
318 |         P_gauss1_list= list(P_gauss1_array)
319 |         P_gauss2_list= list(P_gauss2_array)
320 |         
321 |         title_ids = tokenizer.convert_tokens_to_ids(title)
322 |         input_ids = tokenizer.convert_tokens_to_ids(tokens)
323 |         sdp_ids = tokenizer.convert_tokens_to_ids(sdp)
324 | 
325 |         # The mask has 1 for real tokens and 0 for padding tokens. Only real
326 |         # tokens are attended to.
327 |         title_mask = [1] * len(title_ids)
328 |         input_mask = [1] * len(input_ids)
329 |         sdp_mask = [1] * len(sdp_ids)
330 | 
331 |         # Zero-pad up to the sequence length.
332 |         padding_title = [0] * (max_seq_length - len(title_ids))
333 |         title_ids += padding_title
334 |         
335 |         padding = [0] * (max_seq_length - len(input_ids))
336 |         input_ids += padding
337 |         
338 |         padding_sdp = [0] * (max_seq_length - len(sdp_ids))
339 |         sdp_ids += padding_sdp
340 |         
341 |         title_mask += padding_title
342 |         input_mask += padding
343 |         sdp_mask += padding_sdp
344 |         
345 |         title_segment += padding_title
346 |         segment_ids += padding
347 |         sdp_segment += padding_sdp
348 | 
349 |         assert len(title_ids) == max_seq_length
350 |         assert len(title_mask) == max_seq_length
351 |         assert len(title_segment) == max_seq_length
352 |         assert len(input_ids) == max_seq_length
353 |         assert len(input_mask) == max_seq_length
354 |         assert len(segment_ids) == max_seq_length
355 |         assert len(sdp_ids) == max_seq_length
356 |         assert len(sdp_mask) == max_seq_length
357 |         assert len(sdp_segment) == max_seq_length
358 |         assert len(P_gauss1_list) == max_seq_length
359 |         assert len(P_gauss2_list) == max_seq_length
360 | 
361 |         if output_mode == "classification":
362 |             label_id = label_map[example.label]
363 |         elif output_mode == "regression":
364 |             label_id = float(example.label)
365 |         else:
366 |             raise KeyError(output_mode)
367 | 
368 |         if ex_index < 5:
369 |             logger.info("*** Example ***")
370 |             logger.info("guid: %s" % (example.guid))
371 |             logger.info("title: %s" % " ".join([str(x) for x in title]))
372 |             logger.info("title_ids: %s" % " ".join([str(x) for x in title_ids]))
373 |             logger.info("title_mask: %s" % " ".join([str(x) for x in title_mask]))
374 |             logger.info("title_segment: %s" % " ".join([str(x) for x in title_segment]))
375 |             
376 |             logger.info("tokens: %s" % " ".join([str(x) for x in tokens]))
377 |             logger.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))
378 |             logger.info("input_mask: %s" % " ".join([str(x) for x in input_mask]))
379 |             logger.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids]))
380 | 
381 |             logger.info("sdp: %s" % " ".join([str(x) for x in sdp]))
382 |             logger.info("sdp_ids: %s" % " ".join([str(x) for x in sdp_ids]))
383 |             logger.info("sdp_mask: %s" % " ".join([str(x) for x in sdp_mask]))
384 |             logger.info("sdp_segment: %s" % " ".join([str(x) for x in sdp_segment]))
385 |             
386 |             logger.info("P_gauss1_list: %s" % " ".join([str(x) for x in P_gauss1_list]))
387 |             logger.info("P_gauss2_list: %s" % " ".join([str(x) for x in P_gauss2_list]))
388 |             logger.info("label: %s (id = %d)" % (example.label, label_id))
389 | 
390 |         features.append(
391 |                 InputFeatures(title_ids=title_ids,
392 |                               title_mask=title_mask,
393 |                               title_segment=title_segment,
394 |                               input_ids=input_ids,
395 |                               input_mask=input_mask,
396 |                               segment_ids=segment_ids,
397 |                               sdp_ids=sdp_ids,
398 |                               sdp_mask=sdp_mask,
399 |                               sdp_segment=sdp_segment,
400 |                               P_gauss1_list=P_gauss1_list,
401 |                               P_gauss2_list=P_gauss2_list,
402 |                               label_id=label_id))
403 |     return features
404 | 
405 | 
406 | def _truncate_seq_pair(tokens_a, tokens_b, max_length):
407 |     """Truncates a sequence pair in place to the maximum length."""
408 | 
409 |     # This is a simple heuristic which will always truncate the longer sequence
410 |     # one token at a time. This makes more sense than truncating an equal percent
411 |     # of tokens from each, since if one sequence is very short then each token
412 |     # that's truncated likely contains more information than a longer sequence.
413 |     while True:
414 |         total_length = len(tokens_a) + len(tokens_b)
415 |         if total_length <= max_length:
416 |             break
417 |         if len(tokens_a) > len(tokens_b):
418 |             tokens_a.pop()
419 |         else:
420 |             tokens_b.pop()
421 | 
422 | 
423 | def f1_chemprot(preds, labels):
424 |     p,r,f,s = precision_recall_fscore_support(y_pred=preds, y_true=labels, labels=[0,1,2,3,4], average="micro")
425 |     return {
426 |         "P": p,
427 |         "R": r,
428 |         "F1": f,
429 |     }
430 | 
431 | 
432 | def compute_metrics(task_name, preds, labels):
433 |     assert len(preds) == len(labels)
434 |     if task_name == "cpi":
435 |         return f1_chemprot(preds, labels)
436 |     else:
437 |         raise KeyError(task_name)
438 | 
439 | def main():
440 |     parser = argparse.ArgumentParser()
441 | 
442 |     ## Required parameters
443 |     parser.add_argument("--data_dir",
444 |                         default=None,
445 |                         type=str,
446 |                         required=True,
447 |                         help="The input data dir. Should contain the .tsv files (or other data files) for the task.")
448 |     parser.add_argument("--bert_model", default=None, type=str, required=True,
449 |                         help="Bert pre-trained model selected in the list: bert-base-uncased, "
450 |                         "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, "
451 |                         "bert-base-multilingual-cased, bert-base-chinese.")
452 |     parser.add_argument("--task_name",
453 |                         default=None,
454 |                         type=str,
455 |                         required=True,
456 |                         help="The name of the task to train.")
457 |     parser.add_argument("--output_dir",
458 |                         default=None,
459 |                         type=str,
460 |                         required=True,
461 |                         help="The output directory where the model predictions and checkpoints will be written.")
462 | 
463 |     ## Other parameters
464 |     parser.add_argument("--cache_dir",
465 |                         default="",
466 |                         type=str,
467 |                         help="Where do you want to store the pre-trained models downloaded from s3")
468 |     parser.add_argument("--max_seq_length",
469 |                         default=128,
470 |                         type=int,
471 |                         help="The maximum total input sequence length after WordPiece tokenization. \n"
472 |                              "Sequences longer than this will be truncated, and sequences shorter \n"
473 |                              "than this will be padded.")
474 |     parser.add_argument("--saved_model", default=None, type=str,
475 |                         help="saved model")
476 |     parser.add_argument("--do_train",
477 |                         action='store_true',
478 |                         help="Whether to run training.")
479 |     parser.add_argument("--do_eval",
480 |                         action='store_true',
481 |                         help="Whether to run eval on the dev set.")
482 |     parser.add_argument("--do_predict",
483 |                         action='store_true',
484 |                         help="Whether to run test on the test set.")
485 |     parser.add_argument("--do_lower_case",
486 |                         action='store_true',
487 |                         help="Set this flag if you are using an uncased model.")
488 |     parser.add_argument("--train_batch_size",
489 |                         default=16,
490 |                         type=int,
491 |                         help="Total batch size for training.")
492 |     parser.add_argument("--eval_batch_size",
493 |                         default=8,
494 |                         type=int,
495 |                         help="Total batch size for eval.")
496 |     parser.add_argument("--predict_batch_size",
497 |                         default=8,
498 |                         type=int,
499 |                         help="Total batch size for predict.")
500 |     parser.add_argument("--learning_rate",
501 |                         default=5e-5,
502 |                         type=float,
503 |                         help="The initial learning rate for Adam.")
504 |     parser.add_argument("--num_train_epochs",
505 |                         default=2.0,
506 |                         type=float,
507 |                         help="Total number of training epochs to perform.")
508 |     parser.add_argument("--warmup_proportion",
509 |                         default=0.1,
510 |                         type=float,
511 |                         help="Proportion of training to perform linear learning rate warmup for. "
512 |                              "E.g., 0.1 = 10%% of training.")
513 |     parser.add_argument("--no_cuda",
514 |                         action='store_true',
515 |                         help="Whether not to use CUDA when available")
516 |     parser.add_argument("--local_rank",
517 |                         type=int,
518 |                         default=-1,
519 |                         help="local_rank for distributed training on gpus")
520 |     parser.add_argument('--seed',
521 |                         type=int,
522 |                         default=47,
523 |                         help="random seed for initialization")
524 |     parser.add_argument('--gradient_accumulation_steps',
525 |                         type=int,
526 |                         default=1,
527 |                         help="Number of updates steps to accumulate before performing a backward/update pass.")
528 |     parser.add_argument('--fp16',
529 |                         action='store_true',
530 |                         help="Whether to use 16-bit float precision instead of 32-bit")
531 |     parser.add_argument('--loss_scale',
532 |                         type=float, default=0,
533 |                         help="Loss scaling to improve fp16 numeric stability. Only used when fp16 set to True.\n"
534 |                              "0 (default value): dynamic loss scaling.\n"
535 |                              "Positive power of 2: static loss scaling value.\n")
536 |     parser.add_argument('--server_ip', type=str, default='', help="Can be used for distant debugging.")
537 |     parser.add_argument('--server_port', type=str, default='', help="Can be used for distant debugging.")
538 | 
539 |     args = parser.parse_args()
540 | 
541 |     if args.server_ip and args.server_port:
542 |         # Distant debugging - see https://code.visualstudio.com/docs/python/debugging#_attach-to-a-local-script
543 |         import ptvsd
544 |         print("Waiting for debugger attach")
545 |         ptvsd.enable_attach(address=(args.server_ip, args.server_port), redirect_output=True)
546 |         ptvsd.wait_for_attach()
547 | 
548 |     processors = {
549 |         "cpi": BioBERTChemprotProcessor,
550 |     }
551 | 
552 |     output_modes = {
553 |         "cpi": "classification",
554 |     }
555 | 
556 |     if args.local_rank == -1 or args.no_cuda:
557 |         device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu")
558 |         n_gpu = torch.cuda.device_count()
559 |     else:
560 |         torch.cuda.set_device(args.local_rank)
561 |         device = torch.device("cuda", args.local_rank)
562 |         n_gpu = 1
563 |         # Initializes the distributed backend which will take care of sychronizing nodes/GPUs
564 |         torch.distributed.init_process_group(backend='nccl')
565 |     logger.info("device: {} n_gpu: {}, distributed training: {}, 16-bits training: {}".format(
566 |         device, n_gpu, bool(args.local_rank != -1), args.fp16))
567 | 
568 |     if args.gradient_accumulation_steps < 1:
569 |         raise ValueError("Invalid gradient_accumulation_steps parameter: {}, should be >= 1".format(
570 |                             args.gradient_accumulation_steps))
571 | 
572 |     args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps
573 | 
574 |     random.seed(args.seed)
575 |     np.random.seed(args.seed)
576 |     torch.manual_seed(args.seed)
577 |     if n_gpu > 0:
578 |         torch.cuda.manual_seed_all(args.seed)
579 | 
580 |     if not args.do_train and not args.do_eval:
581 |         raise ValueError("At least one of `do_train` or `do_eval` must be True.")
582 | 
583 |     if os.path.exists(args.output_dir) and os.listdir(args.output_dir) and args.do_train:
584 |         raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir))
585 |     if not os.path.exists(args.output_dir):
586 |         os.makedirs(args.output_dir)
587 | 
588 |     task_name = args.task_name.lower()
589 | 
590 |     if task_name not in processors:
591 |         raise ValueError("Task not found: %s" % (task_name))
592 | 
593 |     processor = processors[task_name]()
594 |     output_mode = output_modes[task_name]
595 | 
596 |     label_list = processor.get_labels()
597 |     num_labels = len(label_list)
598 | 
599 |     tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case, 
600 |                 never_split = ("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]", "CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9"))
601 | 
602 |     tokenizer.vocab["CPR:3"] = 3
603 |     tokenizer.vocab["CPR:4"] = 4
604 |     tokenizer.vocab["CPR:5"] = 5
605 |     tokenizer.vocab["CPR:6"] = 6
606 |     tokenizer.vocab["CPR:9"] = 9
607 |     train_examples = None
608 |     num_train_optimization_steps = None
609 |     
610 | 
611 |     if args.do_train:
612 |         train_examples = processor.get_train_examples(args.data_dir)
613 |         num_train_optimization_steps = int(
614 |             len(train_examples) / args.train_batch_size / args.gradient_accumulation_steps) * args.num_train_epochs
615 |         if args.local_rank != -1:
616 |             num_train_optimization_steps = num_train_optimization_steps // torch.distributed.get_world_size()
617 | 
618 |     # Prepare model
619 |     cache_dir = args.cache_dir if args.cache_dir else os.path.join(str(PYTORCH_PRETRAINED_BERT_CACHE), 'distributed_{}'.format(args.local_rank))
620 |     model = BertForSequenceClassification.from_pretrained(args.bert_model,
621 |               cache_dir=cache_dir,
622 |               num_labels=num_labels)
623 |     if args.fp16:
624 |         model.half()
625 |     model.to(device)
626 |     if args.local_rank != -1:
627 |         try:
628 |             from apex.parallel import DistributedDataParallel as DDP
629 |         except ImportError:
630 |             raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.")
631 | 
632 |         model = DDP(model)
633 |     elif n_gpu > 1:
634 |         model = torch.nn.DataParallel(model)
635 | 
636 |     # Prepare optimizer
637 |     param_optimizer = list(model.named_parameters())
638 |     no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
639 |     optimizer_grouped_parameters = [
640 |         {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
641 |         {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
642 |         ]
643 |     if args.fp16:
644 |         try:
645 |             from apex.optimizers import FP16_Optimizer
646 |             from apex.optimizers import FusedAdam
647 |         except ImportError:
648 |             raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.")
649 | 
650 |         optimizer = FusedAdam(optimizer_grouped_parameters,
651 |                               lr=args.learning_rate,
652 |                               bias_correction=False,
653 |                               max_grad_norm=1.0)
654 |         if args.loss_scale == 0:
655 |             optimizer = FP16_Optimizer(optimizer, dynamic_loss_scale=True)
656 |         else:
657 |             optimizer = FP16_Optimizer(optimizer, static_loss_scale=args.loss_scale)
658 | 
659 |     else:
660 |         optimizer = BertAdam(optimizer_grouped_parameters,
661 |                              lr=args.learning_rate,
662 |                              warmup=args.warmup_proportion,
663 |                              t_total=num_train_optimization_steps,
664 |                              e=1e-08)
665 | 
666 |     global_step = 0
667 |     nb_tr_steps = 0
668 |     tr_loss = 0
669 |     if args.do_train:
670 |         train_features = convert_examples_to_features(
671 |             train_examples, label_list, args.max_seq_length, tokenizer, output_mode)
672 |         logger.info("***** Running training *****")
673 |         logger.info("  Num examples = %d", len(train_examples))
674 |         logger.info("  Batch size = %d", args.train_batch_size)
675 |         logger.info("  Num steps = %d", num_train_optimization_steps)
676 |         all_title_ids = torch.tensor([f.title_ids for f in train_features], dtype=torch.long)
677 |         all_title_mask = torch.tensor([f.title_mask for f in train_features], dtype=torch.long)
678 |         all_title_segment = torch.tensor([f.title_segment for f in train_features], dtype=torch.long)
679 |         
680 |         all_input_ids = torch.tensor([f.input_ids for f in train_features], dtype=torch.long)
681 |         all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long)
682 |         all_segment_ids = torch.tensor([f.segment_ids for f in train_features], dtype=torch.long)
683 |         
684 |         all_sdp_ids = torch.tensor([f.sdp_ids for f in train_features], dtype=torch.long)
685 |         all_sdp_mask = torch.tensor([f.sdp_mask for f in train_features], dtype=torch.long)
686 |         all_sdp_segment = torch.tensor([f.sdp_segment for f in train_features], dtype=torch.long)
687 |         
688 |         all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in train_features], dtype=torch.float)
689 |         all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in train_features], dtype=torch.float)
690 | 
691 |         if output_mode == "classification":
692 |             all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.long)
693 |         elif output_mode == "regression":
694 |             all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.float)
695 | 
696 |         train_data = TensorDataset(all_title_ids, all_title_mask, all_title_segment, all_input_ids, all_input_mask, all_segment_ids, \
697 |                                    all_sdp_ids, all_sdp_mask, all_sdp_segment, all_P_gauss1_list, all_P_gauss2_list,all_label_ids)
698 |         
699 |         if args.local_rank == -1:
700 |             train_sampler = RandomSampler(train_data)
701 |         else:
702 |             train_sampler = DistributedSampler(train_data)
703 |         train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size)
704 | 
705 |         model.train()
706 |         for _ in trange(int(args.num_train_epochs), desc="Epoch"):
707 |             tr_loss = 0
708 |             nb_tr_examples, nb_tr_steps = 0, 0
709 |             for step, batch in enumerate(tqdm(train_dataloader, desc="Iteration")):
710 |                 batch = tuple(t.to(device) for t in batch)
711 |                 title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, \
712 |                 sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list, label_ids = batch
713 | 
714 |                 # define a new function to compute loss values for both output_modes
715 |                 logits = model(title_ids, title_segment, title_mask, input_ids, segment_ids, input_mask, \
716 |                                sdp_ids, sdp_segment, sdp_mask, P_gauss1_list, P_gauss2_list, labels=None)
717 | 
718 |                 if output_mode == "classification":
719 |                     loss_fct = CrossEntropyLoss()
720 |                     loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))
721 |                 elif output_mode == "regression":
722 |                     loss_fct = MSELoss()
723 |                     loss = loss_fct(logits.view(-1), label_ids.view(-1))
724 | 
725 |                 if n_gpu > 1:
726 |                     loss = loss.mean() # mean() to average on multi-gpu.
727 |                 if args.gradient_accumulation_steps > 1:
728 |                     loss = loss / args.gradient_accumulation_steps
729 | 
730 |                 if args.fp16:
731 |                     optimizer.backward(loss)
732 |                 else:
733 |                     loss.backward()
734 | 
735 |                 tr_loss += loss.item()
736 |                 nb_tr_examples += input_ids.size(0)
737 |                 nb_tr_steps += 1
738 |                 if (step + 1) % args.gradient_accumulation_steps == 0:
739 |                     if args.fp16:
740 |                         # modify learning rate with special warm up BERT uses
741 |                         # if args.fp16 is False, BertAdam is used that handles this automatically
742 |                         lr_this_step = args.learning_rate * warmup_linear(global_step/num_train_optimization_steps, args.warmup_proportion)
743 |                         for param_group in optimizer.param_groups:
744 |                             param_group['lr'] = lr_this_step
745 |                     optimizer.step()
746 |                     optimizer.zero_grad()
747 |                     global_step += 1
748 | 
749 |         # Save a trained model and the associated configuration
750 |         model_to_save = model.module if hasattr(model, 'module') else model  # Only save the model it-self
751 |         output_model_file = os.path.join(args.output_dir, WEIGHTS_NAME)
752 |         torch.save(model_to_save.state_dict(), output_model_file)
753 |         output_config_file = os.path.join(args.output_dir, CONFIG_NAME)
754 |         with open(output_config_file, 'w') as f:
755 |             f.write(model_to_save.config.to_json_string())
756 | 
757 |     else:
758 |         output_model_file = os.path.join(args.saved_model, 'pytorch_model.bin')
759 |         output_config_file = os.path.join(args.saved_model, 'bert_config.json')
760 |         config = BertConfig(output_config_file)
761 |         model = BertForSequenceClassification(config, num_labels=num_labels)
762 |         model.load_state_dict(torch.load(output_model_file))
763 |         model.to(device)
764 | 
765 | 
766 |     if args.do_eval and (args.local_rank == -1 or torch.distributed.get_rank() == 0):
767 |         eval_examples = processor.get_dev_examples(args.data_dir)
768 |         eval_features = convert_examples_to_features(
769 |             eval_examples, label_list, args.max_seq_length, tokenizer, output_mode)
770 |         logger.info("***** Running evaluation *****")
771 |         logger.info("  Num examples = %d", len(eval_examples))
772 |         logger.info("  Batch size = %d", args.eval_batch_size)
773 |         all_title_ids = torch.tensor([f.title_ids for f in eval_features], dtype=torch.long)
774 |         all_title_mask = torch.tensor([f.title_mask for f in eval_features], dtype=torch.long)
775 |         all_title_segment = torch.tensor([f.title_segment for f in eval_features], dtype=torch.long)
776 |         
777 |         all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)
778 |         all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)
779 |         all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long)
780 |         
781 |         all_sdp_ids = torch.tensor([f.sdp_ids for f in eval_features], dtype=torch.long)
782 |         all_sdp_mask = torch.tensor([f.sdp_mask for f in eval_features], dtype=torch.long)
783 |         all_sdp_segment = torch.tensor([f.sdp_segment for f in eval_features], dtype=torch.long)
784 |         
785 |         all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in eval_features], dtype=torch.float)
786 |         all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in eval_features], dtype=torch.float)
787 | 
788 |         if output_mode == "classification":
789 |             all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long)
790 |         elif output_mode == "regression":
791 |             all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.float)
792 | 
793 |         eval_data = TensorDataset(all_title_ids, all_title_mask, all_title_segment, all_input_ids, all_input_mask, all_segment_ids, \
794 |                                   all_sdp_ids, all_sdp_mask, all_sdp_segment, all_P_gauss1_list, all_P_gauss2_list, all_label_ids)  
795 |         
796 |         # Run prediction for full data
797 |         eval_sampler = SequentialSampler(eval_data)
798 |         eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size)
799 | 
800 |         model.eval()
801 |         eval_loss = 0
802 |         nb_eval_steps = 0
803 |         preds = []
804 | 
805 |         for title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list, label_ids in tqdm(eval_dataloader, desc="Evaluating"):
806 |             title_ids = title_ids.to(device)
807 |             title_mask = title_mask.to(device)
808 |             title_segment = title_segment.to(device)
809 |             input_ids = input_ids.to(device)
810 |             input_mask = input_mask.to(device)
811 |             segment_ids = segment_ids.to(device)
812 |             sdp_ids = sdp_ids.to(device)
813 |             sdp_mask = sdp_mask.to(device)
814 |             sdp_segment = sdp_segment.to(device)
815 |             P_gauss1_list = P_gauss1_list.to(device)
816 |             P_gauss2_list = P_gauss2_list.to(device)
817 |             label_ids = label_ids.to(device)
818 | 
819 |             with torch.no_grad():
820 |                 logits = model(title_ids, title_segment, title_mask, input_ids, segment_ids, input_mask, sdp_ids, sdp_segment, sdp_mask, P_gauss1_list, P_gauss2_list, labels=None)
821 | 
822 |             # create eval loss and other metric required by the task
823 |             if output_mode == "classification":
824 |                 loss_fct = CrossEntropyLoss()
825 |                 tmp_eval_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))
826 |             elif output_mode == "regression":
827 |                 loss_fct = MSELoss()
828 |                 tmp_eval_loss = loss_fct(logits.view(-1), label_ids.view(-1))
829 |             
830 |             eval_loss += tmp_eval_loss.mean().item()
831 |             nb_eval_steps += 1
832 |             if len(preds) == 0:
833 |                 preds.append(logits.detach().cpu().numpy())
834 |             else:
835 |                 preds[0] = np.append(
836 |                     preds[0], logits.detach().cpu().numpy(), axis=0)
837 | 
838 |         eval_loss = eval_loss / nb_eval_steps
839 |         preds = preds[0]
840 |         if output_mode == "classification":
841 |             preds = np.argmax(preds, axis=1)
842 |         elif output_mode == "regression":
843 |             preds = np.squeeze(preds)
844 |         result = compute_metrics(task_name, preds, all_label_ids.numpy())
845 |         loss = tr_loss/nb_tr_steps if args.do_train else None
846 | 
847 |         result['eval_loss'] = eval_loss
848 |         result['global_step'] = global_step
849 |         result['loss'] = loss
850 | 
851 |         output_eval_file = os.path.join(args.output_dir, "eval_results.txt")
852 |         with open(output_eval_file, "w") as writer:
853 |             logger.info("***** Eval results *****")
854 |             for key in sorted(result.keys()):
855 |                 logger.info("  %s = %s", key, str(result[key]))
856 |                 writer.write("%s = %s\n" % (key, str(result[key])))
857 | 
858 |                 
859 |     if args.do_predict and (args.local_rank == -1 or torch.distributed.get_rank() == 0):
860 |         test_examples = processor.get_test_examples(args.data_dir)
861 |         test_features = convert_examples_to_features(
862 |             test_examples, label_list, args.max_seq_length, tokenizer, output_mode)
863 |         logger.info("***** Running prediction *****")
864 |         logger.info("  Num examples = %d", len(test_examples))
865 |         logger.info("  Batch size = %d", args.predict_batch_size)
866 |         all_title_ids = torch.tensor([f.title_ids for f in test_features], dtype=torch.long)
867 |         all_title_mask = torch.tensor([f.title_mask for f in test_features], dtype=torch.long)
868 |         all_title_segment = torch.tensor([f.title_segment for f in test_features], dtype=torch.long)
869 |         
870 |         all_input_ids = torch.tensor([f.input_ids for f in test_features], dtype=torch.long)
871 |         all_input_mask = torch.tensor([f.input_mask for f in test_features], dtype=torch.long)
872 |         all_segment_ids = torch.tensor([f.segment_ids for f in test_features], dtype=torch.long)
873 |         
874 |         all_sdp_ids = torch.tensor([f.sdp_ids for f in test_features], dtype=torch.long)
875 |         all_sdp_mask = torch.tensor([f.sdp_mask for f in test_features], dtype=torch.long)
876 |         all_sdp_segment = torch.tensor([f.sdp_segment for f in test_features], dtype=torch.long)
877 |         
878 |         all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in test_features], dtype=torch.float)
879 |         all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in test_features], dtype=torch.float)
880 | 
881 |         if output_mode == "classification":
882 |             all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.long)
883 |         elif output_mode == "regression":
884 |             all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.float)
885 | 
886 |         predict_data = TensorDataset(all_title_ids, all_title_mask, all_title_segment, all_input_ids, all_input_mask, all_segment_ids, all_sdp_ids, all_sdp_mask, all_sdp_segment, all_P_gauss1_list, all_P_gauss2_list, all_label_ids)
887 | 
888 |         # Run prediction for full data
889 |         predict_sampler = SequentialSampler(predict_data)
890 |         predict_dataloader = DataLoader(predict_data, sampler=predict_sampler, batch_size=args.predict_batch_size)
891 | 
892 |         model.eval()
893 |         test_loss = 0
894 |         nb_test_steps = 0
895 |         preds = []
896 | 
897 |         for title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list,label_ids in tqdm(predict_dataloader, desc="Predicting"):
898 |             title_ids = title_ids.to(device)
899 |             title_mask = title_mask.to(device)
900 |             title_segment = title_segment.to(device)
901 |             input_ids = input_ids.to(device)
902 |             input_mask = input_mask.to(device)
903 |             segment_ids = segment_ids.to(device)
904 |             sdp_ids = sdp_ids.to(device)
905 |             sdp_mask = sdp_mask.to(device)
906 |             sdp_segment = sdp_segment.to(device)
907 |             P_gauss1_list = P_gauss1_list.to(device)
908 |             P_gauss2_list = P_gauss2_list.to(device)
909 |             label_ids = label_ids.to(device)
910 |             
911 | 
912 |             with torch.no_grad():
913 |                 logits = model(title_ids, title_segment, title_mask, input_ids, segment_ids, input_mask,\
914 |                                sdp_ids, sdp_segment, sdp_mask, P_gauss1_list, P_gauss2_list, labels=None)
915 | 
916 |             # create eval loss and other metric required by the task
917 |             if output_mode == "classification":
918 |                 loss_fct = CrossEntropyLoss()
919 |                 tmp_test_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))
920 |         
921 |             elif output_mode == "regression":
922 |                 loss_fct = MSELoss()
923 |                 tmp_test_loss = loss_fct(logits.view(-1), label_ids.view(-1))
924 |             
925 |             test_loss += tmp_test_loss.mean().item()
926 |             nb_test_steps += 1
927 |             if len(preds) == 0:
928 |                 preds.append(logits.detach().cpu().numpy())
929 |             else:
930 |                 preds[0] = np.append(
931 |                     preds[0], logits.detach().cpu().numpy(), axis=0)
932 | 
933 |         test_loss = test_loss / nb_test_steps
934 |         preds = preds[0]
935 |         if output_mode == "classification":
936 |             preds = np.argmax(preds, axis=1)
937 |         elif output_mode == "regression":
938 |             preds = np.squeeze(preds)
939 |         result = compute_metrics(task_name, preds, all_label_ids.numpy())
940 |         loss = tr_loss/nb_tr_steps if args.do_train else None
941 | 
942 |         result['test_loss'] = test_loss
943 |         result['global_step'] = global_step
944 |         result['loss'] = loss
945 | 
946 |         output_test_file = os.path.join(args.output_dir, "test_results.txt")
947 |         with open(output_test_file, "w") as writer:
948 |             logger.info("***** Predict results *****")
949 |             for key in sorted(result.keys()):
950 |                 logger.info("  %s = %s", key, str(result[key]))
951 |                 writer.write("%s = %s\n" % (key, str(result[key])))
952 | 
953 | 
954 | if __name__ == "__main__":
955 |     main()


--------------------------------------------------------------------------------
/SourceCode/BioRE_BG.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # Author: Cong Sun
  3 | # Email:  suncong132@mail.dlut.edu.cn
  4 | #
  5 | # BioRE_BG
  6 | #
  7 | #--------------------------------------------------------------------------
  8 | 
  9 | 
 10 | 
 11 | from __future__ import absolute_import, division, print_function
 12 | import argparse
 13 | import csv
 14 | import logging
 15 | import os
 16 | import random
 17 | import sys
 18 | import numpy as np
 19 | import torch
 20 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,TensorDataset)
 21 | from torch.utils.data.distributed import DistributedSampler
 22 | from tqdm import tqdm, trange
 23 | from torch.nn import CrossEntropyLoss, MSELoss
 24 | import scipy
 25 | from scipy.stats import pearsonr, spearmanr
 26 | from sklearn.metrics import matthews_corrcoef, f1_score, precision_recall_fscore_support
 27 | 
 28 | import modeling_BG
 29 | from modeling_BG import BertForSequenceClassification, BertConfig, WEIGHTS_NAME, CONFIG_NAME
 30 | from pytorch_pretrained_bert.file_utils import PYTORCH_PRETRAINED_BERT_CACHE
 31 | from pytorch_pretrained_bert.tokenization import BertTokenizer
 32 | from pytorch_pretrained_bert.optimization import BertAdam, warmup_linear
 33 | import os
 34 | 
 35 | logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
 36 |                     datefmt = '%m/%d/%Y %H:%M:%S',
 37 |                     level = logging.INFO)
 38 | logger = logging.getLogger(__name__)
 39 | 
 40 | sigma = 2.5
 41 | mu = 0.0
 42 | 
 43 | norm_0=scipy.stats.norm(mu,sigma).cdf(1)-scipy.stats.norm(mu,sigma).cdf(0)
 44 | norm_1=scipy.stats.norm(mu,sigma).cdf(2)-scipy.stats.norm(mu,sigma).cdf(1)
 45 | norm_2=scipy.stats.norm(mu,sigma).cdf(3)-scipy.stats.norm(mu,sigma).cdf(2)
 46 | norm_3=scipy.stats.norm(mu,sigma).cdf(4)-scipy.stats.norm(mu,sigma).cdf(3)
 47 | norm_4=scipy.stats.norm(mu,sigma).cdf(5)-scipy.stats.norm(mu,sigma).cdf(4)
 48 | norm_5=scipy.stats.norm(mu,sigma).cdf(6)-scipy.stats.norm(mu,sigma).cdf(5)
 49 | norm_6=scipy.stats.norm(mu,sigma).cdf(7)-scipy.stats.norm(mu,sigma).cdf(6)
 50 | 
 51 | 
 52 | 
 53 | class InputExample(object):
 54 |     """A single training/test example for simple sequence classification."""
 55 | 
 56 |     def __init__(self, guid, text_a, text_b=None, label=None):
 57 |         """Constructs a InputExample."""
 58 |         self.guid = guid
 59 |         self.text_a = text_a
 60 |         self.text_b = text_b
 61 |         self.label = label
 62 | 
 63 | 
 64 | class InputFeatures(object):
 65 |     """A single set of features of data."""
 66 | 
 67 |     def __init__(self, input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list, label_id):
 68 |         
 69 |         self.input_ids = input_ids
 70 |         self.input_mask = input_mask
 71 |         self.segment_ids = segment_ids
 72 |         
 73 |         self.P_gauss1_list = P_gauss1_list
 74 |         self.P_gauss2_list = P_gauss2_list
 75 |         self.label_id = label_id
 76 | 
 77 | 
 78 | class DataProcessor(object):
 79 |     """Base class for data converters for sequence classification data sets."""
 80 | 
 81 |     def get_train_examples(self, data_dir):
 82 |         """Gets a collection of `InputExample`s for the train set."""
 83 |         raise NotImplementedError()
 84 | 
 85 |     def get_dev_examples(self, data_dir):
 86 |         """Gets a collection of `InputExample`s for the dev set."""
 87 |         raise NotImplementedError()
 88 | 
 89 |     def get_labels(self):
 90 |         """Gets the list of labels for this data set."""
 91 |         raise NotImplementedError()
 92 | 
 93 |     @classmethod
 94 |     def _read_tsv(cls, input_file, quotechar=None):
 95 |         """Reads a tab separated value file."""
 96 |         with open(input_file, "r") as f:
 97 |             reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
 98 |             lines = []
 99 |             for line in reader:
100 |                 if sys.version_info[0] == 2:
101 |                     line = list(unicode(cell, 'utf-8') for cell in line)
102 |                 lines.append(line)
103 |             return lines
104 | 
105 | 
106 | class BioBERTChemprotProcessor(DataProcessor):
107 |     """Processor for the BioBERT data set (GLUE version)."""
108 | 
109 |     def get_train_examples(self, data_dir):
110 |         """See base class."""
111 |         return self._create_examples(
112 |             self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
113 | 
114 |     def get_dev_examples(self, data_dir):
115 |         """See base class."""
116 |         return self._create_examples(
117 |             self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
118 | 
119 |     def get_test_examples(self, data_dir):
120 |         """See base class."""
121 |         return self._create_examples(
122 |             self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
123 | 
124 | 
125 |     def get_labels(self):
126 |         """See base class."""
127 |         return ["CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9", "false"]
128 | 
129 |     def _create_examples(self, lines, set_type):
130 |         """Creates examples for the training and dev sets."""
131 |         examples = []
132 |         for (i, line) in enumerate(lines):
133 |             if set_type == "train" and i == 0:
134 |                 continue
135 |             if set_type == "dev" and i == 0:
136 |                 continue
137 |             if set_type == "test" and i == 0:
138 |                 continue
139 |             guid = "%s-%s" % (set_type, i)
140 |             # title = line[1]
141 |             text_a = line[2]
142 |             # sdp = line[3]
143 |             label = line[4]
144 |             examples.append(
145 |                 InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
146 |         return examples
147 | 
148 | 
149 | class BioBERT_DDIProcessor(DataProcessor):
150 |     """Processor for the BioBERT data set (GLUE version)."""
151 | 
152 |     def get_train_examples(self, data_dir):
153 |         """See base class."""
154 |         return self._create_examples(
155 |             self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
156 | 
157 |     def get_dev_examples(self, data_dir):
158 |         """See base class."""
159 |         return self._create_examples(
160 |             self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
161 | 
162 |     def get_test_examples(self, data_dir):
163 |         """See base class."""
164 |         return self._create_examples(
165 |             self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
166 | 
167 |     def get_labels(self):
168 |         """See base class."""
169 |         return ["DDI-advise", "DDI-effect", "DDI-mechanism", "DDI-int", "DDI-false"]
170 | 
171 |     def _create_examples(self, lines, set_type):
172 |         """Creates examples for the training and dev sets."""
173 |         examples = []
174 |         for (i, line) in enumerate(lines):
175 |             if set_type == "train" and i == 0:
176 |                 continue
177 |             if set_type == "dev" and i == 0:
178 |                 continue
179 |             if set_type == "test" and i == 0:
180 |                 continue
181 |             guid = "%s-%s" % (set_type, i)
182 |             text_a = line[1]
183 |             label = line[2]
184 |             examples.append(
185 |                 InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
186 |         return examples
187 | 
188 | 
189 | def convert_examples_to_features(examples, label_list, max_seq_length,
190 |                                  tokenizer, output_mode):
191 |     """Loads a data file into a list of `InputBatch`s."""
192 | 
193 |     label_map = {label : i for i, label in enumerate(label_list)}
194 | 
195 |     features = []
196 |     for (ex_index, example) in enumerate(examples):
197 |         if ex_index % 10000 == 0:
198 |             logger.info("Writing example %d of %d" % (ex_index, len(examples)))
199 | 
200 |         tokens_a = tokenizer.tokenize(example.text_a)
201 | 
202 |         tokens_b = None
203 |         if example.text_b:
204 |             tokens_b = tokenizer.tokenize(example.text_b)
205 |             # Modifies `tokens_a` and `tokens_b` in place so that the total
206 |             # length is less than the specified length.
207 |             # Account for [CLS], [SEP], [SEP] with "- 3"
208 |             _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
209 |         else:
210 |             # Account for [CLS] and [SEP] with "- 2"
211 |             if len(tokens_a) > max_seq_length - 2:
212 |                 tokens_a = tokens_a[:(max_seq_length - 2)]
213 | 
214 |         tokens = ["[CLS]"] + tokens_a + ["[SEP]"]
215 |         segment_ids = [0] * len(tokens)
216 | 
217 |         if tokens_b:
218 |             tokens += tokens_b + ["[SEP]"]
219 |             segment_ids += [1] * (len(tokens_b) + 1)
220 | 
221 |         if '@' not in tokens:
222 |             P_gauss1_array = np.zeros(max_seq_length)
223 |             P_gauss2_array = np.zeros(max_seq_length)
224 |         else:
225 |             ann1_index=tokens.index('@')
226 |             if '@' not in tokens[ann1_index+1:]:
227 |                 P_array = np.zeros(max_seq_length)
228 |                 P_array[ann1_index-1] = norm_1
229 |                 P_array[ann1_index] = norm_0
230 |                 P_array[ann1_index+1] = norm_0
231 |                 if ann1_index+2 < max_seq_length:
232 |                     P_array[ann1_index+2] = norm_1
233 |                 if ann1_index+3 < max_seq_length:
234 |                     P_array[ann1_index+3] = norm_2
235 |                 if ann1_index+4 < max_seq_length:
236 |                     P_array[ann1_index+4] = norm_3
237 |                 if ann1_index+5 < max_seq_length:
238 |                     P_array[ann1_index+5] = norm_4
239 |                 if ann1_index+6 < max_seq_length:
240 |                     P_array[ann1_index+6] = norm_5
241 |                 if ann1_index+7 < max_seq_length:
242 |                     P_array[ann1_index+7] = norm_6
243 |                 if ann1_index-2 >= 0:
244 |                     P_array[ann1_index-2] = norm_2
245 |                 if ann1_index-3 >= 0:
246 |                     P_array[ann1_index-3] = norm_3
247 |                 if ann1_index-4 >= 0:
248 |                     P_array[ann1_index-4] = norm_4
249 |                 if ann1_index-5 >= 0:
250 |                     P_array[ann1_index-5] = norm_5
251 |                 if ann1_index-6 >= 0:
252 |                     P_array[ann1_index-6] = norm_6
253 |                 P_gauss1_array = P_array
254 |                 P_gauss2_array = P_array
255 |             else:
256 |                 ann2_index=tokens.index('@',ann1_index+1)
257 |                 P1_array = np.zeros(max_seq_length)
258 |                 P2_array = np.zeros(max_seq_length)
259 |     
260 |                 P1_array[ann1_index-1] = norm_1
261 |                 P1_array[ann1_index] = norm_0
262 |                 P1_array[ann1_index+1] = norm_0
263 |                 if ann1_index+2 < max_seq_length:
264 |                     P1_array[ann1_index+2] = norm_1
265 |                 if ann1_index+3 < max_seq_length:
266 |                     P1_array[ann1_index+3] = norm_2
267 |                 if ann1_index+4 < max_seq_length:
268 |                     P1_array[ann1_index+4] = norm_3
269 |                 if ann1_index+5 < max_seq_length:
270 |                     P1_array[ann1_index+5] = norm_4
271 |                 if ann1_index+6 < max_seq_length:
272 |                     P1_array[ann1_index+6] = norm_5
273 |                 if ann1_index+7 < max_seq_length:
274 |                     P1_array[ann1_index+7] = norm_6
275 |                 if ann1_index-2 >=0:
276 |                     P1_array[ann1_index-2] = norm_2
277 |                 if ann1_index-3 >=0:
278 |                     P1_array[ann1_index-3] = norm_3
279 |                 if ann1_index-4 >=0:
280 |                     P1_array[ann1_index-4] = norm_4
281 |                 if ann1_index-5 >=0:
282 |                     P1_array[ann1_index-5] = norm_5
283 |                 if ann1_index-6 >=0:
284 |                     P1_array[ann1_index-6] = norm_6
285 |                     
286 |                 P2_array[ann2_index-1] = norm_1
287 |                 P2_array[ann2_index] = norm_0
288 |                 P2_array[ann2_index+1] = norm_0
289 |                 if ann2_index+2 < max_seq_length:
290 |                     P2_array[ann2_index+2] = norm_1
291 |                 if ann2_index+3 < max_seq_length:
292 |                     P2_array[ann2_index+3] = norm_2
293 |                 if ann2_index+4 < max_seq_length:
294 |                     P2_array[ann2_index+4] = norm_3
295 |                 if ann2_index+5 < max_seq_length:
296 |                     P2_array[ann2_index+5] = norm_4
297 |                 if ann2_index+6 < max_seq_length:
298 |                     P2_array[ann2_index+6] = norm_5
299 |                 if ann2_index+7 < max_seq_length:
300 |                     P2_array[ann2_index+7] = norm_6
301 |                 if ann2_index-2 >=0:
302 |                     P2_array[ann2_index-2] = norm_2
303 |                 if ann2_index-3 >= 0:
304 |                     P2_array[ann2_index-3] = norm_3
305 |                 if ann2_index-4 >= 0:
306 |                     P2_array[ann2_index-4] = norm_4
307 |                 if ann2_index-5 >= 0:
308 |                     P2_array[ann2_index-5] = norm_5
309 |                 if ann2_index-6 >= 0:
310 |                     P2_array[ann2_index-6] = norm_6
311 |                 P_gauss1_array = P1_array 
312 |                 P_gauss2_array = P2_array
313 | 
314 |         P_gauss1_list= list(P_gauss1_array)
315 |         P_gauss2_list= list(P_gauss2_array)
316 |         
317 | 
318 |         input_ids = tokenizer.convert_tokens_to_ids(tokens)
319 |         # The mask has 1 for real tokens and 0 for padding tokens. Only real
320 |         # tokens are attended to.
321 |         input_mask = [1] * len(input_ids)
322 | 
323 | 
324 |         # Zero-pad up to the sequence length.
325 |         padding = [0] * (max_seq_length - len(input_ids))
326 |         input_ids += padding
327 |         input_mask += padding
328 |         segment_ids += padding
329 | 
330 |         assert len(input_ids) == max_seq_length
331 |         assert len(input_mask) == max_seq_length
332 |         assert len(segment_ids) == max_seq_length
333 |         assert len(P_gauss1_list) == max_seq_length
334 |         assert len(P_gauss2_list) == max_seq_length
335 | 
336 |         if output_mode == "classification":
337 |             label_id = label_map[example.label]
338 |         elif output_mode == "regression":
339 |             label_id = float(example.label)
340 |         else:
341 |             raise KeyError(output_mode)
342 | 
343 |         if ex_index < 5:
344 |             logger.info("*** Example ***")
345 |             logger.info("guid: %s" % (example.guid))            
346 |             logger.info("tokens: %s" % " ".join([str(x) for x in tokens]))
347 |             logger.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))
348 |             logger.info("input_mask: %s" % " ".join([str(x) for x in input_mask]))
349 |             logger.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids]))            
350 |             logger.info("P_gauss1_list: %s" % " ".join([str(x) for x in P_gauss1_list]))
351 |             logger.info("P_gauss2_list: %s" % " ".join([str(x) for x in P_gauss2_list]))
352 |             logger.info("label: %s (id = %d)" % (example.label, label_id))
353 | 
354 |         features.append(
355 |                 InputFeatures(input_ids=input_ids,
356 |                               input_mask=input_mask,
357 |                               segment_ids=segment_ids,
358 |                               P_gauss1_list=P_gauss1_list,
359 |                               P_gauss2_list=P_gauss2_list,
360 |                               label_id=label_id))
361 |     return features
362 | 
363 | 
364 | def _truncate_seq_pair(tokens_a, tokens_b, max_length):
365 |     """Truncates a sequence pair in place to the maximum length."""
366 | 
367 |     # This is a simple heuristic which will always truncate the longer sequence
368 |     # one token at a time. This makes more sense than truncating an equal percent
369 |     # of tokens from each, since if one sequence is very short then each token
370 |     # that's truncated likely contains more information than a longer sequence.
371 |     while True:
372 |         total_length = len(tokens_a) + len(tokens_b)
373 |         if total_length <= max_length:
374 |             break
375 |         if len(tokens_a) > len(tokens_b):
376 |             tokens_a.pop()
377 |         else:
378 |             tokens_b.pop()
379 | 
380 | def f1_chemprot(preds, labels):
381 |     p,r,f,s = precision_recall_fscore_support(y_pred=preds, y_true=labels, labels=[0,1,2,3,4], average="micro")
382 |     return {
383 |         "P": p,
384 |         "R": r,
385 |         "F1": f,
386 |     }
387 | 
388 | def f1_ddi(preds, labels):
389 |     p,r,f,s = precision_recall_fscore_support(y_pred=preds, y_true=labels, labels=[0,1,2,3], average="micro")
390 |     return {
391 |         "P": p,
392 |         "R": r,
393 |         "F1": f,
394 |     }
395 | 
396 | def compute_metrics(task_name, preds, labels):
397 |     assert len(preds) == len(labels)
398 |     if task_name == "cpi":
399 |         return f1_chemprot(preds, labels)
400 |     elif task_name == "ddi":
401 |         return f1_ddi(preds, labels)
402 |     else:
403 |         raise KeyError(task_name)
404 | 
405 | def main():
406 |     parser = argparse.ArgumentParser()
407 | 
408 |     ## Required parameters
409 |     parser.add_argument("--data_dir",
410 |                         default=None,
411 |                         type=str,
412 |                         required=True,
413 |                         help="The input data dir. Should contain the .tsv files (or other data files) for the task.")
414 |     parser.add_argument("--bert_model", default=None, type=str, required=True,
415 |                         help="Bert pre-trained model selected in the list: bert-base-uncased, "
416 |                         "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, "
417 |                         "bert-base-multilingual-cased, bert-base-chinese.")
418 |     parser.add_argument("--task_name",
419 |                         default=None,
420 |                         type=str,
421 |                         required=True,
422 |                         help="The name of the task to train.")
423 |     parser.add_argument("--output_dir",
424 |                         default=None,
425 |                         type=str,
426 |                         required=True,
427 |                         help="The output directory where the model predictions and checkpoints will be written.")
428 | 
429 |     ## Other parameters
430 |     parser.add_argument("--cache_dir",
431 |                         default="",
432 |                         type=str,
433 |                         help="Where do you want to store the pre-trained models downloaded from s3")
434 |     parser.add_argument("--max_seq_length",
435 |                         default=128,
436 |                         type=int,
437 |                         help="The maximum total input sequence length after WordPiece tokenization. \n"
438 |                              "Sequences longer than this will be truncated, and sequences shorter \n"
439 |                              "than this will be padded.")
440 |     parser.add_argument("--saved_model", default=None, type=str,
441 |                         help="saved model")
442 |     parser.add_argument("--do_train",
443 |                         action='store_true',
444 |                         help="Whether to run training.")
445 |     parser.add_argument("--do_eval",
446 |                         action='store_true',
447 |                         help="Whether to run eval on the dev set.")
448 |     parser.add_argument("--do_predict",
449 |                         action='store_true',
450 |                         help="Whether to run test on the test set.")
451 |     parser.add_argument("--do_lower_case",
452 |                         action='store_true',
453 |                         help="Set this flag if you are using an uncased model.")
454 |     parser.add_argument("--train_batch_size",
455 |                         default=16,
456 |                         type=int,
457 |                         help="Total batch size for training.")
458 |     parser.add_argument("--eval_batch_size",
459 |                         default=8,
460 |                         type=int,
461 |                         help="Total batch size for eval.")
462 |     parser.add_argument("--predict_batch_size",
463 |                         default=8,
464 |                         type=int,
465 |                         help="Total batch size for predict.")
466 |     parser.add_argument("--learning_rate",
467 |                         default=5e-5,
468 |                         type=float,
469 |                         help="The initial learning rate for Adam.")
470 |     parser.add_argument("--num_train_epochs",
471 |                         default=2.0,
472 |                         type=float,
473 |                         help="Total number of training epochs to perform.")
474 |     parser.add_argument("--warmup_proportion",
475 |                         default=0.1,
476 |                         type=float,
477 |                         help="Proportion of training to perform linear learning rate warmup for. "
478 |                              "E.g., 0.1 = 10%% of training.")
479 |     parser.add_argument("--no_cuda",
480 |                         action='store_true',
481 |                         help="Whether not to use CUDA when available")
482 |     parser.add_argument("--local_rank",
483 |                         type=int,
484 |                         default=-1,
485 |                         help="local_rank for distributed training on gpus")
486 |     parser.add_argument('--seed',
487 |                         type=int,
488 |                         default=47,
489 |                         help="random seed for initialization")
490 |     parser.add_argument('--gradient_accumulation_steps',
491 |                         type=int,
492 |                         default=1,
493 |                         help="Number of updates steps to accumulate before performing a backward/update pass.")
494 |     parser.add_argument('--fp16',
495 |                         action='store_true',
496 |                         help="Whether to use 16-bit float precision instead of 32-bit")
497 |     parser.add_argument('--loss_scale',
498 |                         type=float, default=0,
499 |                         help="Loss scaling to improve fp16 numeric stability. Only used when fp16 set to True.\n"
500 |                              "0 (default value): dynamic loss scaling.\n"
501 |                              "Positive power of 2: static loss scaling value.\n")
502 |     parser.add_argument('--server_ip', type=str, default='', help="Can be used for distant debugging.")
503 |     parser.add_argument('--server_port', type=str, default='', help="Can be used for distant debugging.")
504 | 
505 |     args = parser.parse_args()
506 | 
507 |     if args.server_ip and args.server_port:
508 |         # Distant debugging - see https://code.visualstudio.com/docs/python/debugging#_attach-to-a-local-script
509 |         import ptvsd
510 |         print("Waiting for debugger attach")
511 |         ptvsd.enable_attach(address=(args.server_ip, args.server_port), redirect_output=True)
512 |         ptvsd.wait_for_attach()
513 | 
514 |     processors = {
515 |         "cpi": BioBERTChemprotProcessor,
516 |         "ddi": BioBERT_DDIProcessor,
517 |     }
518 | 
519 |     output_modes = {
520 |         "cpi": "classification",
521 |         "ddi": "classification",
522 |     }
523 | 
524 |     if args.local_rank == -1 or args.no_cuda:
525 |         device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu")
526 |         n_gpu = torch.cuda.device_count()
527 |     else:
528 |         torch.cuda.set_device(args.local_rank)
529 |         device = torch.device("cuda", args.local_rank)
530 |         n_gpu = 1
531 |         # Initializes the distributed backend which will take care of sychronizing nodes/GPUs
532 |         torch.distributed.init_process_group(backend='nccl')
533 |     logger.info("device: {} n_gpu: {}, distributed training: {}, 16-bits training: {}".format(
534 |         device, n_gpu, bool(args.local_rank != -1), args.fp16))
535 | 
536 |     if args.gradient_accumulation_steps < 1:
537 |         raise ValueError("Invalid gradient_accumulation_steps parameter: {}, should be >= 1".format(
538 |                             args.gradient_accumulation_steps))
539 | 
540 |     args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps
541 | 
542 |     random.seed(args.seed)
543 |     np.random.seed(args.seed)
544 |     torch.manual_seed(args.seed)
545 |     if n_gpu > 0:
546 |         torch.cuda.manual_seed_all(args.seed)
547 | 
548 |     if not args.do_train and not args.do_eval:
549 |         raise ValueError("At least one of `do_train` or `do_eval` must be True.")
550 | 
551 |     if os.path.exists(args.output_dir) and os.listdir(args.output_dir) and args.do_train:
552 |         raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir))
553 |     if not os.path.exists(args.output_dir):
554 |         os.makedirs(args.output_dir)
555 | 
556 |     task_name = args.task_name.lower()
557 | 
558 |     if task_name not in processors:
559 |         raise ValueError("Task not found: %s" % (task_name))
560 | 
561 |     processor = processors[task_name]()
562 |     output_mode = output_modes[task_name]
563 | 
564 |     label_list = processor.get_labels()
565 |     num_labels = len(label_list)
566 | 
567 |     tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case, 
568 |                 never_split = ("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]", "CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9"))
569 | 
570 |     tokenizer.vocab["CPR:3"] = 3
571 |     tokenizer.vocab["CPR:4"] = 4
572 |     tokenizer.vocab["CPR:5"] = 5
573 |     tokenizer.vocab["CPR:6"] = 6
574 |     tokenizer.vocab["CPR:9"] = 9
575 |     train_examples = None
576 |     num_train_optimization_steps = None
577 |     
578 | 
579 |     if args.do_train:
580 |         train_examples = processor.get_train_examples(args.data_dir)
581 |         num_train_optimization_steps = int(
582 |             len(train_examples) / args.train_batch_size / args.gradient_accumulation_steps) * args.num_train_epochs
583 |         if args.local_rank != -1:
584 |             num_train_optimization_steps = num_train_optimization_steps // torch.distributed.get_world_size()
585 | 
586 |     # Prepare model
587 |     cache_dir = args.cache_dir if args.cache_dir else os.path.join(str(PYTORCH_PRETRAINED_BERT_CACHE), 'distributed_{}'.format(args.local_rank))
588 |     model = BertForSequenceClassification.from_pretrained(args.bert_model,
589 |               cache_dir=cache_dir,
590 |               num_labels=num_labels)
591 |     if args.fp16:
592 |         model.half()
593 |     model.to(device)
594 |     if args.local_rank != -1:
595 |         try:
596 |             from apex.parallel import DistributedDataParallel as DDP
597 |         except ImportError:
598 |             raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.")
599 | 
600 |         model = DDP(model)
601 |     elif n_gpu > 1:
602 |         model = torch.nn.DataParallel(model)
603 | 
604 |     # Prepare optimizer
605 |     param_optimizer = list(model.named_parameters())
606 |     no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
607 |     optimizer_grouped_parameters = [
608 |         {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
609 |         {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
610 |         ]
611 |     if args.fp16:
612 |         try:
613 |             from apex.optimizers import FP16_Optimizer
614 |             from apex.optimizers import FusedAdam
615 |         except ImportError:
616 |             raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.")
617 | 
618 |         optimizer = FusedAdam(optimizer_grouped_parameters,
619 |                               lr=args.learning_rate,
620 |                               bias_correction=False,
621 |                               max_grad_norm=1.0)
622 |         if args.loss_scale == 0:
623 |             optimizer = FP16_Optimizer(optimizer, dynamic_loss_scale=True)
624 |         else:
625 |             optimizer = FP16_Optimizer(optimizer, static_loss_scale=args.loss_scale)
626 | 
627 |     else:
628 |         optimizer = BertAdam(optimizer_grouped_parameters,
629 |                              lr=args.learning_rate,
630 |                              warmup=args.warmup_proportion,
631 |                              t_total=num_train_optimization_steps,
632 |                              e=1e-08)
633 | 
634 |     global_step = 0
635 |     nb_tr_steps = 0
636 |     tr_loss = 0
637 |     if args.do_train:
638 |         train_features = convert_examples_to_features(
639 |             train_examples, label_list, args.max_seq_length, tokenizer, output_mode)
640 |         logger.info("***** Running training *****")
641 |         logger.info("  Num examples = %d", len(train_examples))
642 |         logger.info("  Batch size = %d", args.train_batch_size)
643 |         logger.info("  Num steps = %d", num_train_optimization_steps)
644 |         
645 |         all_input_ids = torch.tensor([f.input_ids for f in train_features], dtype=torch.long)
646 |         all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long)
647 |         all_segment_ids = torch.tensor([f.segment_ids for f in train_features], dtype=torch.long)
648 |         all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in train_features], dtype=torch.float)
649 |         all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in train_features], dtype=torch.float)
650 | 
651 |         if output_mode == "classification":
652 |             all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.long)
653 |         elif output_mode == "regression":
654 |             all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.float)
655 | 
656 |         train_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_P_gauss1_list, all_P_gauss2_list,all_label_ids)
657 |         
658 |         if args.local_rank == -1:
659 |             train_sampler = RandomSampler(train_data)
660 |         else:
661 |             train_sampler = DistributedSampler(train_data)
662 |         train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size)
663 | 
664 |         model.train()
665 |         for _ in trange(int(args.num_train_epochs), desc="Epoch"):
666 |             tr_loss = 0
667 |             nb_tr_examples, nb_tr_steps = 0, 0
668 |             for step, batch in enumerate(tqdm(train_dataloader, desc="Iteration")):
669 |                 batch = tuple(t.to(device) for t in batch)
670 |                 input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list, label_ids = batch
671 | 
672 |                 # define a new function to compute loss values for both output_modes
673 |                 logits = model(input_ids, segment_ids, input_mask, P_gauss1_list, P_gauss2_list, labels=None)
674 | 
675 |                 if output_mode == "classification":
676 |                     loss_fct = CrossEntropyLoss()
677 |                     loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))
678 |                 elif output_mode == "regression":
679 |                     loss_fct = MSELoss()
680 |                     loss = loss_fct(logits.view(-1), label_ids.view(-1))
681 | 
682 |                 if n_gpu > 1:
683 |                     loss = loss.mean() # mean() to average on multi-gpu.
684 |                 if args.gradient_accumulation_steps > 1:
685 |                     loss = loss / args.gradient_accumulation_steps
686 | 
687 |                 if args.fp16:
688 |                     optimizer.backward(loss)
689 |                 else:
690 |                     loss.backward()
691 | 
692 |                 tr_loss += loss.item()
693 |                 nb_tr_examples += input_ids.size(0)
694 |                 nb_tr_steps += 1
695 |                 if (step + 1) % args.gradient_accumulation_steps == 0:
696 |                     if args.fp16:
697 |                         # modify learning rate with special warm up BERT uses
698 |                         # if args.fp16 is False, BertAdam is used that handles this automatically
699 |                         lr_this_step = args.learning_rate * warmup_linear(global_step/num_train_optimization_steps, args.warmup_proportion)
700 |                         for param_group in optimizer.param_groups:
701 |                             param_group['lr'] = lr_this_step
702 |                     optimizer.step()
703 |                     optimizer.zero_grad()
704 |                     global_step += 1
705 | 
706 |         # Save a trained model and the associated configuration
707 |         model_to_save = model.module if hasattr(model, 'module') else model  # Only save the model it-self
708 |         output_model_file = os.path.join(args.output_dir, WEIGHTS_NAME)
709 |         torch.save(model_to_save.state_dict(), output_model_file)
710 |         output_config_file = os.path.join(args.output_dir, CONFIG_NAME)
711 |         with open(output_config_file, 'w') as f:
712 |             f.write(model_to_save.config.to_json_string())
713 | 
714 |     else:
715 |         output_model_file = os.path.join(args.saved_model, 'pytorch_model.bin')
716 |         output_config_file = os.path.join(args.saved_model, 'bert_config.json')
717 |         config = BertConfig(output_config_file)
718 |         model = BertForSequenceClassification(config, num_labels=num_labels)
719 |         model.load_state_dict(torch.load(output_model_file))
720 |         model.to(device)
721 | 
722 | 
723 |     if args.do_eval and (args.local_rank == -1 or torch.distributed.get_rank() == 0):
724 |         eval_examples = processor.get_dev_examples(args.data_dir)
725 |         eval_features = convert_examples_to_features(
726 |             eval_examples, label_list, args.max_seq_length, tokenizer, output_mode)
727 |         logger.info("***** Running evaluation *****")
728 |         logger.info("  Num examples = %d", len(eval_examples))
729 |         logger.info("  Batch size = %d", args.eval_batch_size)
730 |         
731 |         all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)
732 |         all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)
733 |         all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long)
734 |         all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in eval_features], dtype=torch.float)
735 |         all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in eval_features], dtype=torch.float)
736 | 
737 |         if output_mode == "classification":
738 |             all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long)
739 |         elif output_mode == "regression":
740 |             all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.float)
741 | 
742 |         eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_P_gauss1_list, all_P_gauss2_list, all_label_ids)  
743 |         
744 |         # Run prediction for full data
745 |         eval_sampler = SequentialSampler(eval_data)
746 |         eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size)
747 | 
748 |         model.eval()
749 |         eval_loss = 0
750 |         nb_eval_steps = 0
751 |         preds = []
752 | 
753 |         for input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list, label_ids in tqdm(eval_dataloader, desc="Evaluating"):
754 |             input_ids = input_ids.to(device)
755 |             input_mask = input_mask.to(device)
756 |             segment_ids = segment_ids.to(device)
757 |             P_gauss1_list = P_gauss1_list.to(device)
758 |             P_gauss2_list = P_gauss2_list.to(device)
759 |             label_ids = label_ids.to(device)
760 | 
761 |             with torch.no_grad():
762 |                 logits = model(input_ids, segment_ids, input_mask, P_gauss1_list, P_gauss2_list, labels=None)
763 | 
764 |             # create eval loss and other metric required by the task
765 |             if output_mode == "classification":
766 |                 loss_fct = CrossEntropyLoss()
767 |                 tmp_eval_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))
768 |             elif output_mode == "regression":
769 |                 loss_fct = MSELoss()
770 |                 tmp_eval_loss = loss_fct(logits.view(-1), label_ids.view(-1))
771 |             
772 |             eval_loss += tmp_eval_loss.mean().item()
773 |             nb_eval_steps += 1
774 |             if len(preds) == 0:
775 |                 preds.append(logits.detach().cpu().numpy())
776 |             else:
777 |                 preds[0] = np.append(
778 |                     preds[0], logits.detach().cpu().numpy(), axis=0)
779 | 
780 |         eval_loss = eval_loss / nb_eval_steps
781 |         preds = preds[0]
782 |         if output_mode == "classification":
783 |             preds = np.argmax(preds, axis=1)
784 |         elif output_mode == "regression":
785 |             preds = np.squeeze(preds)
786 |         result = compute_metrics(task_name, preds, all_label_ids.numpy())
787 |         loss = tr_loss/nb_tr_steps if args.do_train else None
788 | 
789 |         result['eval_loss'] = eval_loss
790 |         result['global_step'] = global_step
791 |         result['loss'] = loss
792 | 
793 |         output_eval_file = os.path.join(args.output_dir, "eval_results.txt")
794 |         with open(output_eval_file, "w") as writer:
795 |             logger.info("***** Eval results *****")
796 |             for key in sorted(result.keys()):
797 |                 logger.info("  %s = %s", key, str(result[key]))
798 |                 writer.write("%s = %s\n" % (key, str(result[key])))
799 | 
800 |                 
801 |     if args.do_predict and (args.local_rank == -1 or torch.distributed.get_rank() == 0):
802 |         test_examples = processor.get_test_examples(args.data_dir)
803 |         test_features = convert_examples_to_features(
804 |             test_examples, label_list, args.max_seq_length, tokenizer, output_mode)
805 |         logger.info("***** Running prediction *****")
806 |         logger.info("  Num examples = %d", len(test_examples))
807 |         logger.info("  Batch size = %d", args.predict_batch_size)
808 |         
809 |         all_input_ids = torch.tensor([f.input_ids for f in test_features], dtype=torch.long)
810 |         all_input_mask = torch.tensor([f.input_mask for f in test_features], dtype=torch.long)
811 |         all_segment_ids = torch.tensor([f.segment_ids for f in test_features], dtype=torch.long)
812 |         all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in test_features], dtype=torch.float)
813 |         all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in test_features], dtype=torch.float)
814 | 
815 |         if output_mode == "classification":
816 |             all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.long)
817 |         elif output_mode == "regression":
818 |             all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.float)
819 | 
820 |         predict_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_P_gauss1_list, all_P_gauss2_list, all_label_ids)
821 | 
822 |         # Run prediction for full data
823 |         predict_sampler = SequentialSampler(predict_data)
824 |         predict_dataloader = DataLoader(predict_data, sampler=predict_sampler, batch_size=args.predict_batch_size)
825 | 
826 |         model.eval()
827 |         test_loss = 0
828 |         nb_test_steps = 0
829 |         preds = []
830 | 
831 |         for input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list,label_ids in tqdm(predict_dataloader, desc="Predicting"):
832 |             input_ids = input_ids.to(device)
833 |             input_mask = input_mask.to(device)
834 |             segment_ids = segment_ids.to(device)
835 |             P_gauss1_list = P_gauss1_list.to(device)
836 |             P_gauss2_list = P_gauss2_list.to(device)
837 |             label_ids = label_ids.to(device)
838 |             
839 | 
840 |             with torch.no_grad():
841 |                 logits = model(input_ids, segment_ids, input_mask, P_gauss1_list, P_gauss2_list, labels=None)
842 | 
843 |             # create eval loss and other metric required by the task
844 |             if output_mode == "classification":
845 |                 loss_fct = CrossEntropyLoss()
846 |                 tmp_test_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))
847 |         
848 |             elif output_mode == "regression":
849 |                 loss_fct = MSELoss()
850 |                 tmp_test_loss = loss_fct(logits.view(-1), label_ids.view(-1))
851 |             
852 |             test_loss += tmp_test_loss.mean().item()
853 |             nb_test_steps += 1
854 |             if len(preds) == 0:
855 |                 preds.append(logits.detach().cpu().numpy())
856 |             else:
857 |                 preds[0] = np.append(
858 |                     preds[0], logits.detach().cpu().numpy(), axis=0)
859 | 
860 |         test_loss = test_loss / nb_test_steps
861 |         preds = preds[0]
862 |         if output_mode == "classification":
863 |             preds = np.argmax(preds, axis=1)
864 |         elif output_mode == "regression":
865 |             preds = np.squeeze(preds)
866 |         result = compute_metrics(task_name, preds, all_label_ids.numpy())
867 |         loss = tr_loss/nb_tr_steps if args.do_train else None
868 | 
869 |         result['test_loss'] = test_loss
870 |         result['global_step'] = global_step
871 |         result['loss'] = loss
872 | 
873 |         output_test_file = os.path.join(args.output_dir, "test_results.txt")
874 |         with open(output_test_file, "w") as writer:
875 |             logger.info("***** Predict results *****")
876 |             for key in sorted(result.keys()):
877 |                 logger.info("  %s = %s", key, str(result[key]))
878 |                 writer.write("%s = %s\n" % (key, str(result[key])))
879 | 
880 | 
881 | if __name__ == "__main__":
882 |     main()


--------------------------------------------------------------------------------
/SourceCode/file_utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Utilities for working with the local dataset cache.
  3 | This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp
  4 | Copyright by the AllenNLP authors.
  5 | """
  6 | from __future__ import (absolute_import, division, print_function, unicode_literals)
  7 | 
  8 | import json
  9 | import logging
 10 | import os
 11 | import shutil
 12 | import tempfile
 13 | from functools import wraps
 14 | from hashlib import sha256
 15 | import sys
 16 | from io import open
 17 | 
 18 | import boto3
 19 | import requests
 20 | from botocore.exceptions import ClientError
 21 | from tqdm import tqdm
 22 | 
 23 | try:
 24 |     from urllib.parse import urlparse
 25 | except ImportError:
 26 |     from urlparse import urlparse
 27 | 
 28 | try:
 29 |     from pathlib import Path
 30 |     PYTORCH_PRETRAINED_BERT_CACHE = Path(os.getenv('PYTORCH_PRETRAINED_BERT_CACHE',
 31 |                                                    Path.home() / '.pytorch_pretrained_bert'))
 32 | except (AttributeError, ImportError):
 33 |     PYTORCH_PRETRAINED_BERT_CACHE = os.getenv('PYTORCH_PRETRAINED_BERT_CACHE',
 34 |                                               os.path.join(os.path.expanduser("~"), '.pytorch_pretrained_bert'))
 35 | 
 36 | logger = logging.getLogger(__name__)  # pylint: disable=invalid-name
 37 | 
 38 | 
 39 | def url_to_filename(url, etag=None):
 40 |     """
 41 |     Convert `url` into a hashed filename in a repeatable way.
 42 |     If `etag` is specified, append its hash to the url's, delimited
 43 |     by a period.
 44 |     """
 45 |     url_bytes = url.encode('utf-8')
 46 |     url_hash = sha256(url_bytes)
 47 |     filename = url_hash.hexdigest()
 48 | 
 49 |     if etag:
 50 |         etag_bytes = etag.encode('utf-8')
 51 |         etag_hash = sha256(etag_bytes)
 52 |         filename += '.' + etag_hash.hexdigest()
 53 | 
 54 |     return filename
 55 | 
 56 | 
 57 | def filename_to_url(filename, cache_dir=None):
 58 |     """
 59 |     Return the url and etag (which may be ``None``) stored for `filename`.
 60 |     Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist.
 61 |     """
 62 |     if cache_dir is None:
 63 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
 64 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
 65 |         cache_dir = str(cache_dir)
 66 | 
 67 |     cache_path = os.path.join(cache_dir, filename)
 68 |     if not os.path.exists(cache_path):
 69 |         raise EnvironmentError("file {} not found".format(cache_path))
 70 | 
 71 |     meta_path = cache_path + '.json'
 72 |     if not os.path.exists(meta_path):
 73 |         raise EnvironmentError("file {} not found".format(meta_path))
 74 | 
 75 |     with open(meta_path, encoding="utf-8") as meta_file:
 76 |         metadata = json.load(meta_file)
 77 |     url = metadata['url']
 78 |     etag = metadata['etag']
 79 | 
 80 |     return url, etag
 81 | 
 82 | 
 83 | def cached_path(url_or_filename, cache_dir=None):
 84 |     """
 85 |     Given something that might be a URL (or might be a local path),
 86 |     determine which. If it's a URL, download the file and cache it, and
 87 |     return the path to the cached file. If it's already a local path,
 88 |     make sure the file exists and then return the path.
 89 |     """
 90 |     if cache_dir is None:
 91 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
 92 |     if sys.version_info[0] == 3 and isinstance(url_or_filename, Path):
 93 |         url_or_filename = str(url_or_filename)
 94 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
 95 |         cache_dir = str(cache_dir)
 96 | 
 97 |     parsed = urlparse(url_or_filename)
 98 | 
 99 |     if parsed.scheme in ('http', 'https', 's3'):
100 |         # URL, so get it from the cache (downloading if necessary)
101 |         return get_from_cache(url_or_filename, cache_dir)
102 |     elif os.path.exists(url_or_filename):
103 |         # File, and it exists.
104 |         return url_or_filename
105 |     elif parsed.scheme == '':
106 |         # File, but it doesn't exist.
107 |         raise EnvironmentError("file {} not found".format(url_or_filename))
108 |     else:
109 |         # Something unknown
110 |         raise ValueError("unable to parse {} as a URL or as a local path".format(url_or_filename))
111 | 
112 | 
113 | def split_s3_path(url):
114 |     """Split a full s3 path into the bucket name and path."""
115 |     parsed = urlparse(url)
116 |     if not parsed.netloc or not parsed.path:
117 |         raise ValueError("bad s3 path {}".format(url))
118 |     bucket_name = parsed.netloc
119 |     s3_path = parsed.path
120 |     # Remove '/' at beginning of path.
121 |     if s3_path.startswith("/"):
122 |         s3_path = s3_path[1:]
123 |     return bucket_name, s3_path
124 | 
125 | 
126 | def s3_request(func):
127 |     """
128 |     Wrapper function for s3 requests in order to create more helpful error
129 |     messages.
130 |     """
131 | 
132 |     @wraps(func)
133 |     def wrapper(url, *args, **kwargs):
134 |         try:
135 |             return func(url, *args, **kwargs)
136 |         except ClientError as exc:
137 |             if int(exc.response["Error"]["Code"]) == 404:
138 |                 raise EnvironmentError("file {} not found".format(url))
139 |             else:
140 |                 raise
141 | 
142 |     return wrapper
143 | 
144 | 
145 | @s3_request
146 | def s3_etag(url):
147 |     """Check ETag on S3 object."""
148 |     s3_resource = boto3.resource("s3")
149 |     bucket_name, s3_path = split_s3_path(url)
150 |     s3_object = s3_resource.Object(bucket_name, s3_path)
151 |     return s3_object.e_tag
152 | 
153 | 
154 | @s3_request
155 | def s3_get(url, temp_file):
156 |     """Pull a file directly from S3."""
157 |     s3_resource = boto3.resource("s3")
158 |     bucket_name, s3_path = split_s3_path(url)
159 |     s3_resource.Bucket(bucket_name).download_fileobj(s3_path, temp_file)
160 | 
161 | 
162 | def http_get(url, temp_file):
163 |     req = requests.get(url, stream=True)
164 |     content_length = req.headers.get('Content-Length')
165 |     total = int(content_length) if content_length is not None else None
166 |     progress = tqdm(unit="B", total=total)
167 |     for chunk in req.iter_content(chunk_size=1024):
168 |         if chunk: # filter out keep-alive new chunks
169 |             progress.update(len(chunk))
170 |             temp_file.write(chunk)
171 |     progress.close()
172 | 
173 | 
174 | def get_from_cache(url, cache_dir=None):
175 |     """
176 |     Given a URL, look for the corresponding dataset in the local cache.
177 |     If it's not there, download it. Then return the path to the cached file.
178 |     """
179 |     if cache_dir is None:
180 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
181 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
182 |         cache_dir = str(cache_dir)
183 | 
184 |     if not os.path.exists(cache_dir):
185 |         os.makedirs(cache_dir)
186 | 
187 |     # Get eTag to add to filename, if it exists.
188 |     if url.startswith("s3://"):
189 |         etag = s3_etag(url)
190 |     else:
191 |         response = requests.head(url, allow_redirects=True)
192 |         if response.status_code != 200:
193 |             raise IOError("HEAD request failed for url {} with status code {}"
194 |                           .format(url, response.status_code))
195 |         etag = response.headers.get("ETag")
196 | 
197 |     filename = url_to_filename(url, etag)
198 | 
199 |     # get cache path to put the file
200 |     cache_path = os.path.join(cache_dir, filename)
201 | 
202 |     if not os.path.exists(cache_path):
203 |         # Download to temporary file, then copy to cache dir once finished.
204 |         # Otherwise you get corrupt cache entries if the download gets interrupted.
205 |         with tempfile.NamedTemporaryFile() as temp_file:
206 |             logger.info("%s not found in cache, downloading to %s", url, temp_file.name)
207 | 
208 |             # GET file object
209 |             if url.startswith("s3://"):
210 |                 s3_get(url, temp_file)
211 |             else:
212 |                 http_get(url, temp_file)
213 | 
214 |             # we are copying the file before closing it, so flush to avoid truncation
215 |             temp_file.flush()
216 |             # shutil.copyfileobj() starts at the current position, so go to the start
217 |             temp_file.seek(0)
218 | 
219 |             logger.info("copying %s to cache at %s", temp_file.name, cache_path)
220 |             with open(cache_path, 'wb') as cache_file:
221 |                 shutil.copyfileobj(temp_file, cache_file)
222 | 
223 |             logger.info("creating metadata file for %s", cache_path)
224 |             meta = {'url': url, 'etag': etag}
225 |             meta_path = cache_path + '.json'
226 |             with open(meta_path, 'w', encoding="utf-8") as meta_file:
227 |                 json.dump(meta, meta_file)
228 | 
229 |             logger.info("removing temp file %s", temp_file.name)
230 | 
231 |     return cache_path
232 | 
233 | 
234 | def read_set_from_file(filename):
235 |     '''
236 |     Extract a de-duped collection (set) of text from a file.
237 |     Expected file format is one item per line.
238 |     '''
239 |     collection = set()
240 |     with open(filename, 'r', encoding='utf-8') as file_:
241 |         for line in file_:
242 |             collection.add(line.rstrip())
243 |     return collection
244 | 
245 | 
246 | def get_file_extension(path, dot=True, lower=True):
247 |     ext = os.path.splitext(path)[1]
248 |     ext = ext if dot else ext[1:]
249 |     return ext.lower() if lower else ext
250 | 


--------------------------------------------------------------------------------
/SourceCode/modeling.py:
--------------------------------------------------------------------------------
   1 | # coding=utf-8
   2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
   3 | # Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
   4 | #
   5 | # Licensed under the Apache License, Version 2.0 (the "License");
   6 | # you may not use this file except in compliance with the License.
   7 | # You may obtain a copy of the License at
   8 | #
   9 | #     http://www.apache.org/licenses/LICENSE-2.0
  10 | #
  11 | # Unless required by applicable law or agreed to in writing, software
  12 | # distributed under the License is distributed on an "AS IS" BASIS,
  13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14 | # See the License for the specific language governing permissions and
  15 | # limitations under the License.
  16 | """PyTorch BERT model."""
  17 | 
  18 | from __future__ import absolute_import, division, print_function, unicode_literals
  19 | 
  20 | import copy
  21 | import json
  22 | import logging
  23 | import math
  24 | import os
  25 | import shutil
  26 | import tarfile
  27 | import tempfile
  28 | import sys
  29 | from io import open
  30 | 
  31 | import torch
  32 | from torch import nn
  33 | from torch.nn import CrossEntropyLoss
  34 | 
  35 | import file_utils
  36 | from file_utils import cached_path
  37 | import numpy as np
  38 | import math
  39 | import scipy.stats
  40 | 
  41 | 
  42 | logger = logging.getLogger(__name__)
  43 | 
  44 | PRETRAINED_MODEL_ARCHIVE_MAP = {
  45 |     'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz",
  46 |     'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz",
  47 |     'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz",
  48 |     'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz",
  49 |     'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz",
  50 |     'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased.tar.gz",
  51 |     'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz",
  52 | }
  53 | CONFIG_NAME = 'bert_config.json'
  54 | WEIGHTS_NAME = 'pytorch_model.bin'
  55 | TF_WEIGHTS_NAME = 'model.ckpt'
  56 | 
  57 | def load_tf_weights_in_bert(model, tf_checkpoint_path):
  58 |     """ Load tf checkpoints in a pytorch model
  59 |     """
  60 |     try:
  61 |         import re
  62 |         import numpy as np
  63 |         import tensorflow as tf
  64 |     except ImportError:
  65 |         print("Loading a TensorFlow models in PyTorch, requires TensorFlow to be installed. Please see "
  66 |             "https://www.tensorflow.org/install/ for installation instructions.")
  67 |         raise
  68 |     tf_path = os.path.abspath(tf_checkpoint_path)
  69 |     print("Converting TensorFlow checkpoint from {}".format(tf_path))
  70 |     # Load weights from TF model
  71 |     init_vars = tf.train.list_variables(tf_path)
  72 |     names = []
  73 |     arrays = []
  74 |     for name, shape in init_vars:
  75 |         print("Loading TF weight {} with shape {}".format(name, shape))
  76 |         array = tf.train.load_variable(tf_path, name)
  77 |         names.append(name)
  78 |         arrays.append(array)
  79 | 
  80 |     for name, array in zip(names, arrays):
  81 |         name = name.split('/')
  82 |         # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
  83 |         # which are not required for using pretrained model
  84 |         if any(n in ["adam_v", "adam_m", "global_step"] for n in name):
  85 |             print("Skipping {}".format("/".join(name)))
  86 |             continue
  87 |         pointer = model
  88 |         for m_name in name:
  89 |             if re.fullmatch(r'[A-Za-z]+_\d+', m_name):
  90 |                 l = re.split(r'_(\d+)', m_name)
  91 |             else:
  92 |                 l = [m_name]
  93 |             if l[0] == 'kernel' or l[0] == 'gamma':
  94 |                 pointer = getattr(pointer, 'weight')
  95 |             elif l[0] == 'output_bias' or l[0] == 'beta':
  96 |                 pointer = getattr(pointer, 'bias')
  97 |             elif l[0] == 'output_weights':
  98 |                 pointer = getattr(pointer, 'weight')
  99 |             elif l[0] == 'squad':
 100 |                 pointer = getattr(pointer, 'classifier')
 101 |             else:
 102 |                 try:
 103 |                     pointer = getattr(pointer, l[0])
 104 |                 except AttributeError:
 105 |                     print("Skipping {}".format("/".join(name)))
 106 |                     continue
 107 |             if len(l) >= 2:
 108 |                 num = int(l[1])
 109 |                 pointer = pointer[num]
 110 |         if m_name[-11:] == '_embeddings':
 111 |             pointer = getattr(pointer, 'weight')
 112 |         elif m_name == 'kernel':
 113 |             array = np.transpose(array)
 114 |         try:
 115 |             assert pointer.shape == array.shape
 116 |         except AssertionError as e:
 117 |             e.args += (pointer.shape, array.shape)
 118 |             raise
 119 |         print("Initialize PyTorch weight {}".format(name))
 120 |         pointer.data = torch.from_numpy(array)
 121 |     return model
 122 | 
 123 | 
 124 | def gelu(x):
 125 |     """Implementation of the gelu activation function.
 126 |         For information: OpenAI GPT's gelu is slightly different (and gives slightly different results):
 127 |         0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
 128 |         Also see https://arxiv.org/abs/1606.08415
 129 |     """
 130 |     return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
 131 | 
 132 | 
 133 | def swish(x):
 134 |     return x * torch.sigmoid(x)
 135 | 
 136 | 
 137 | ACT2FN = {"gelu": gelu, "relu": torch.nn.functional.relu, "swish": swish}
 138 | 
 139 | 
 140 | class BertConfig(object):
 141 |     """Configuration class to store the configuration of a `BertModel`.
 142 |     """
 143 |     def __init__(self,
 144 |                  vocab_size_or_config_json_file,
 145 |                  hidden_size=768,
 146 |                  num_hidden_layers=12,
 147 |                  num_attention_heads=12,
 148 |                  intermediate_size=3072,
 149 |                  hidden_act="gelu",
 150 |                  hidden_dropout_prob=0.1,
 151 |                  attention_probs_dropout_prob=0.1,
 152 |                  max_position_embeddings=512,
 153 |                  type_vocab_size=2,
 154 |                  initializer_range=0.02):
 155 |         """Constructs BertConfig.
 156 | 
 157 |         Args:
 158 |             vocab_size_or_config_json_file: Vocabulary size of `inputs_ids` in `BertModel`.
 159 |             hidden_size: Size of the encoder layers and the pooler layer.
 160 |             num_hidden_layers: Number of hidden layers in the Transformer encoder.
 161 |             num_attention_heads: Number of attention heads for each attention layer in
 162 |                 the Transformer encoder.
 163 |             intermediate_size: The size of the "intermediate" (i.e., feed-forward)
 164 |                 layer in the Transformer encoder.
 165 |             hidden_act: The non-linear activation function (function or string) in the
 166 |                 encoder and pooler. If string, "gelu", "relu" and "swish" are supported.
 167 |             hidden_dropout_prob: The dropout probabilitiy for all fully connected
 168 |                 layers in the embeddings, encoder, and pooler.
 169 |             attention_probs_dropout_prob: The dropout ratio for the attention
 170 |                 probabilities.
 171 |             max_position_embeddings: The maximum sequence length that this model might
 172 |                 ever be used with. Typically set this to something large just in case
 173 |                 (e.g., 512 or 1024 or 2048).
 174 |             type_vocab_size: The vocabulary size of the `token_type_ids` passed into
 175 |                 `BertModel`.
 176 |             initializer_range: The sttdev of the truncated_normal_initializer for
 177 |                 initializing all weight matrices.
 178 |         """
 179 |         if isinstance(vocab_size_or_config_json_file, str) or (sys.version_info[0] == 2
 180 |                         and isinstance(vocab_size_or_config_json_file, unicode)):
 181 |             with open(vocab_size_or_config_json_file, "r", encoding='utf-8') as reader:
 182 |                 json_config = json.loads(reader.read())
 183 |             for key, value in json_config.items():
 184 |                 self.__dict__[key] = value
 185 |         elif isinstance(vocab_size_or_config_json_file, int):
 186 |             self.vocab_size = vocab_size_or_config_json_file
 187 |             self.hidden_size = hidden_size
 188 |             self.num_hidden_layers = num_hidden_layers
 189 |             self.num_attention_heads = num_attention_heads
 190 |             self.hidden_act = hidden_act
 191 |             self.intermediate_size = intermediate_size
 192 |             self.hidden_dropout_prob = hidden_dropout_prob
 193 |             self.attention_probs_dropout_prob = attention_probs_dropout_prob
 194 |             self.max_position_embeddings = max_position_embeddings
 195 |             self.type_vocab_size = type_vocab_size
 196 |             self.initializer_range = initializer_range
 197 |         else:
 198 |             raise ValueError("First argument must be either a vocabulary size (int)"
 199 |                              "or the path to a pretrained model config file (str)")
 200 | 
 201 |     @classmethod
 202 |     def from_dict(cls, json_object):
 203 |         """Constructs a `BertConfig` from a Python dictionary of parameters."""
 204 |         config = BertConfig(vocab_size_or_config_json_file=-1)
 205 |         for key, value in json_object.items():
 206 |             config.__dict__[key] = value
 207 |         return config
 208 | 
 209 |     @classmethod
 210 |     def from_json_file(cls, json_file):
 211 |         """Constructs a `BertConfig` from a json file of parameters."""
 212 |         with open(json_file, "r", encoding='utf-8') as reader:
 213 |             text = reader.read()
 214 |         return cls.from_dict(json.loads(text))
 215 | 
 216 |     def __repr__(self):
 217 |         return str(self.to_json_string())
 218 | 
 219 |     def to_dict(self):
 220 |         """Serializes this instance to a Python dictionary."""
 221 |         output = copy.deepcopy(self.__dict__)
 222 |         return output
 223 | 
 224 |     def to_json_string(self):
 225 |         """Serializes this instance to a JSON string."""
 226 |         return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n"
 227 | 
 228 | try:
 229 |     from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm
 230 | except ImportError:
 231 |     logger.info("Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .")
 232 |     class BertLayerNorm(nn.Module):
 233 |         def __init__(self, hidden_size, eps=1e-12):
 234 |             """Construct a layernorm module in the TF style (epsilon inside the square root).
 235 |             """
 236 |             super(BertLayerNorm, self).__init__()
 237 |             self.weight = nn.Parameter(torch.ones(hidden_size))
 238 |             self.bias = nn.Parameter(torch.zeros(hidden_size))
 239 |             self.variance_epsilon = eps
 240 | 
 241 |         def forward(self, x):
 242 |             u = x.mean(-1, keepdim=True)
 243 |             s = (x - u).pow(2).mean(-1, keepdim=True)
 244 |             x = (x - u) / torch.sqrt(s + self.variance_epsilon)
 245 |             return self.weight * x + self.bias
 246 | 
 247 | class BertEmbeddings(nn.Module):
 248 |     """Construct the embeddings from word, position and token_type embeddings.
 249 |     """
 250 |     def __init__(self, config):
 251 |         super(BertEmbeddings, self).__init__()
 252 |         self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=0)
 253 |         self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
 254 |         self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
 255 | 
 256 |         # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
 257 |         # any TensorFlow checkpoint file
 258 |         self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
 259 |         self.dropout = nn.Dropout(config.hidden_dropout_prob)
 260 | 
 261 |     def forward(self, input_ids, token_type_ids=None):
 262 |         seq_length = input_ids.size(1)
 263 |         position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device)
 264 |         position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
 265 |         if token_type_ids is None:
 266 |             token_type_ids = torch.zeros_like(input_ids)
 267 | 
 268 |         words_embeddings = self.word_embeddings(input_ids)
 269 |         position_embeddings = self.position_embeddings(position_ids)
 270 |         token_type_embeddings = self.token_type_embeddings(token_type_ids)
 271 | 
 272 |         embeddings = words_embeddings + position_embeddings + token_type_embeddings
 273 |         embeddings = self.LayerNorm(embeddings)
 274 |         embeddings = self.dropout(embeddings)
 275 |         return embeddings
 276 | 
 277 | 
 278 | class BertSelfAttention(nn.Module):
 279 |     def __init__(self, config):
 280 |         super(BertSelfAttention, self).__init__()
 281 |         if config.hidden_size % config.num_attention_heads != 0:
 282 |             raise ValueError(
 283 |                 "The hidden size (%d) is not a multiple of the number of attention "
 284 |                 "heads (%d)" % (config.hidden_size, config.num_attention_heads))
 285 |         self.num_attention_heads = config.num_attention_heads
 286 |         self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
 287 |         self.all_head_size = self.num_attention_heads * self.attention_head_size
 288 | 
 289 |         self.query = nn.Linear(config.hidden_size, self.all_head_size)
 290 |         self.key = nn.Linear(config.hidden_size, self.all_head_size)
 291 |         self.value = nn.Linear(config.hidden_size, self.all_head_size)
 292 | 
 293 |         self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
 294 | 
 295 |     def transpose_for_scores(self, x):
 296 |         new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
 297 |         x = x.view(*new_x_shape)
 298 |         return x.permute(0, 2, 1, 3)
 299 | 
 300 |     def forward(self, hidden_states, attention_mask):
 301 |         mixed_query_layer = self.query(hidden_states)
 302 |         mixed_key_layer = self.key(hidden_states)
 303 |         mixed_value_layer = self.value(hidden_states)
 304 | 
 305 |         query_layer = self.transpose_for_scores(mixed_query_layer)
 306 |         key_layer = self.transpose_for_scores(mixed_key_layer)
 307 |         value_layer = self.transpose_for_scores(mixed_value_layer)
 308 | 
 309 |         # Take the dot product between "query" and "key" to get the raw attention scores.
 310 |         attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
 311 |         attention_scores = attention_scores / math.sqrt(self.attention_head_size)
 312 |         # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
 313 |         attention_scores = attention_scores + attention_mask
 314 | 
 315 |         # Normalize the attention scores to probabilities.
 316 |         attention_probs = nn.Softmax(dim=-1)(attention_scores)
 317 | 
 318 |         # This is actually dropping out entire tokens to attend to, which might
 319 |         # seem a bit unusual, but is taken from the original Transformer paper.
 320 |         attention_probs = self.dropout(attention_probs)
 321 | 
 322 |         context_layer = torch.matmul(attention_probs, value_layer)
 323 |         context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
 324 |         new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
 325 |         context_layer = context_layer.view(*new_context_layer_shape)
 326 |         return context_layer
 327 | 
 328 | 
 329 | class BertSelfOutput(nn.Module):
 330 |     def __init__(self, config):
 331 |         super(BertSelfOutput, self).__init__()
 332 |         self.dense = nn.Linear(config.hidden_size, config.hidden_size)
 333 |         self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
 334 |         self.dropout = nn.Dropout(config.hidden_dropout_prob)
 335 | 
 336 |     def forward(self, hidden_states, input_tensor):
 337 |         hidden_states = self.dense(hidden_states)
 338 |         hidden_states = self.dropout(hidden_states)
 339 |         hidden_states = self.LayerNorm(hidden_states + input_tensor)
 340 |         return hidden_states
 341 | 
 342 | 
 343 | class BertAttention(nn.Module):
 344 |     def __init__(self, config):
 345 |         super(BertAttention, self).__init__()
 346 |         self.self = BertSelfAttention(config)
 347 |         self.output = BertSelfOutput(config)
 348 | 
 349 |     def forward(self, input_tensor, attention_mask):
 350 |         self_output = self.self(input_tensor, attention_mask)
 351 |         attention_output = self.output(self_output, input_tensor)
 352 |         return attention_output
 353 | 
 354 | 
 355 | class BertIntermediate(nn.Module):
 356 |     def __init__(self, config):
 357 |         super(BertIntermediate, self).__init__()
 358 |         self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
 359 |         if isinstance(config.hidden_act, str) or (sys.version_info[0] == 2 and isinstance(config.hidden_act, unicode)):
 360 |             self.intermediate_act_fn = ACT2FN[config.hidden_act]
 361 |         else:
 362 |             self.intermediate_act_fn = config.hidden_act
 363 | 
 364 |     def forward(self, hidden_states):
 365 |         hidden_states = self.dense(hidden_states)
 366 |         hidden_states = self.intermediate_act_fn(hidden_states)
 367 |         return hidden_states
 368 | 
 369 | 
 370 | class BertOutput(nn.Module):
 371 |     def __init__(self, config):
 372 |         super(BertOutput, self).__init__()
 373 |         self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
 374 |         self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
 375 |         self.dropout = nn.Dropout(config.hidden_dropout_prob)
 376 | 
 377 |     def forward(self, hidden_states, input_tensor):
 378 |         hidden_states = self.dense(hidden_states)
 379 |         hidden_states = self.dropout(hidden_states)
 380 |         hidden_states = self.LayerNorm(hidden_states + input_tensor)
 381 |         return hidden_states
 382 | 
 383 | 
 384 | class BertLayer(nn.Module):
 385 |     def __init__(self, config):
 386 |         super(BertLayer, self).__init__()
 387 |         self.attention = BertAttention(config)
 388 |         self.intermediate = BertIntermediate(config)
 389 |         self.output = BertOutput(config)
 390 | 
 391 |     def forward(self, hidden_states, attention_mask):
 392 |         attention_output = self.attention(hidden_states, attention_mask)
 393 |         intermediate_output = self.intermediate(attention_output)
 394 |         layer_output = self.output(intermediate_output, attention_output)
 395 |         return layer_output
 396 | 
 397 | 
 398 | class BertEncoder(nn.Module):
 399 |     def __init__(self, config):
 400 |         super(BertEncoder, self).__init__()
 401 |         layer = BertLayer(config)
 402 |         self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(config.num_hidden_layers)])
 403 | 
 404 |     def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True):
 405 |         all_encoder_layers = []
 406 |         for layer_module in self.layer:
 407 |             hidden_states = layer_module(hidden_states, attention_mask)
 408 |             if output_all_encoded_layers:
 409 |                 all_encoder_layers.append(hidden_states)
 410 |         if not output_all_encoded_layers:
 411 |             all_encoder_layers.append(hidden_states)
 412 |         return all_encoder_layers
 413 | 
 414 | 
 415 | class BertPooler(nn.Module):
 416 |     def __init__(self, config):
 417 |         super(BertPooler, self).__init__()
 418 |         self.dense = nn.Linear(config.hidden_size, config.hidden_size)
 419 |         self.activation = nn.Tanh()
 420 | 
 421 |     def forward(self, hidden_states):
 422 |         # We "pool" the model by simply taking the hidden state corresponding
 423 |         # to the first token.
 424 |         first_token_tensor = hidden_states[:, 0]
 425 |         pooled_output = self.dense(first_token_tensor)
 426 |         pooled_output = self.activation(pooled_output)
 427 |         return pooled_output
 428 | 
 429 | 
 430 | class BertPredictionHeadTransform(nn.Module):
 431 |     def __init__(self, config):
 432 |         super(BertPredictionHeadTransform, self).__init__()
 433 |         self.dense = nn.Linear(config.hidden_size, config.hidden_size)
 434 |         if isinstance(config.hidden_act, str) or (sys.version_info[0] == 2 and isinstance(config.hidden_act, unicode)):
 435 |             self.transform_act_fn = ACT2FN[config.hidden_act]
 436 |         else:
 437 |             self.transform_act_fn = config.hidden_act
 438 |         self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12)
 439 | 
 440 |     def forward(self, hidden_states):
 441 |         hidden_states = self.dense(hidden_states)
 442 |         hidden_states = self.transform_act_fn(hidden_states)
 443 |         hidden_states = self.LayerNorm(hidden_states)
 444 |         return hidden_states
 445 | 
 446 | 
 447 | class BertLMPredictionHead(nn.Module):
 448 |     def __init__(self, config, bert_model_embedding_weights):
 449 |         super(BertLMPredictionHead, self).__init__()
 450 |         self.transform = BertPredictionHeadTransform(config)
 451 | 
 452 |         # The output weights are the same as the input embeddings, but there is
 453 |         # an output-only bias for each token.
 454 |         self.decoder = nn.Linear(bert_model_embedding_weights.size(1),
 455 |                                  bert_model_embedding_weights.size(0),
 456 |                                  bias=False)
 457 |         self.decoder.weight = bert_model_embedding_weights
 458 |         self.bias = nn.Parameter(torch.zeros(bert_model_embedding_weights.size(0)))
 459 | 
 460 |     def forward(self, hidden_states):
 461 |         hidden_states = self.transform(hidden_states)
 462 |         hidden_states = self.decoder(hidden_states) + self.bias
 463 |         return hidden_states
 464 | 
 465 | 
 466 | class BertOnlyMLMHead(nn.Module):
 467 |     def __init__(self, config, bert_model_embedding_weights):
 468 |         super(BertOnlyMLMHead, self).__init__()
 469 |         self.predictions = BertLMPredictionHead(config, bert_model_embedding_weights)
 470 | 
 471 |     def forward(self, sequence_output):
 472 |         prediction_scores = self.predictions(sequence_output)
 473 |         return prediction_scores
 474 | 
 475 | 
 476 | class BertOnlyNSPHead(nn.Module):
 477 |     def __init__(self, config):
 478 |         super(BertOnlyNSPHead, self).__init__()
 479 |         self.seq_relationship = nn.Linear(config.hidden_size, 2)
 480 | 
 481 |     def forward(self, pooled_output):
 482 |         seq_relationship_score = self.seq_relationship(pooled_output)
 483 |         return seq_relationship_score
 484 | 
 485 | 
 486 | class BertPreTrainingHeads(nn.Module):
 487 |     def __init__(self, config, bert_model_embedding_weights):
 488 |         super(BertPreTrainingHeads, self).__init__()
 489 |         self.predictions = BertLMPredictionHead(config, bert_model_embedding_weights)
 490 |         self.seq_relationship = nn.Linear(config.hidden_size, 2)
 491 | 
 492 |     def forward(self, sequence_output, pooled_output):
 493 |         prediction_scores = self.predictions(sequence_output)
 494 |         seq_relationship_score = self.seq_relationship(pooled_output)
 495 |         return prediction_scores, seq_relationship_score
 496 | 
 497 | 
 498 | class BertPreTrainedModel(nn.Module):
 499 |     """ An abstract class to handle weights initialization and
 500 |         a simple interface for dowloading and loading pretrained models.
 501 |     """
 502 |     def __init__(self, config, *inputs, **kwargs):
 503 |         super(BertPreTrainedModel, self).__init__()
 504 |         if not isinstance(config, BertConfig):
 505 |             raise ValueError(
 506 |                 "Parameter config in `{}(config)` should be an instance of class `BertConfig`. "
 507 |                 "To create a model from a Google pretrained model use "
 508 |                 "`model = {}.from_pretrained(PRETRAINED_MODEL_NAME)`".format(
 509 |                     self.__class__.__name__, self.__class__.__name__
 510 |                 ))
 511 |         self.config = config
 512 | 
 513 |     def init_bert_weights(self, module):
 514 |         """ Initialize the weights.
 515 |         """
 516 |         if isinstance(module, (nn.Linear, nn.Embedding)):
 517 |             # Slightly different from the TF version which uses truncated_normal for initialization
 518 |             # cf https://github.com/pytorch/pytorch/pull/5617
 519 |             module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
 520 |         elif isinstance(module, BertLayerNorm):
 521 |             module.bias.data.zero_()
 522 |             module.weight.data.fill_(1.0)
 523 |         if isinstance(module, nn.Linear) and module.bias is not None:
 524 |             module.bias.data.zero_()
 525 | 
 526 |     @classmethod
 527 |     def from_pretrained(cls, pretrained_model_name_or_path, state_dict=None, cache_dir=None,
 528 |                         from_tf=False, *inputs, **kwargs):
 529 |         """
 530 |         Instantiate a BertPreTrainedModel from a pre-trained model file or a pytorch state dict.
 531 |         Download and cache the pre-trained model file if needed.
 532 | 
 533 |         Params:
 534 |             pretrained_model_name_or_path: either:
 535 |                 - a str with the name of a pre-trained model to load selected in the list of:
 536 |                     . `bert-base-uncased`
 537 |                     . `bert-large-uncased`
 538 |                     . `bert-base-cased`
 539 |                     . `bert-large-cased`
 540 |                     . `bert-base-multilingual-uncased`
 541 |                     . `bert-base-multilingual-cased`
 542 |                     . `bert-base-chinese`
 543 |                 - a path or url to a pretrained model archive containing:
 544 |                     . `bert_config.json` a configuration file for the model
 545 |                     . `pytorch_model.bin` a PyTorch dump of a BertForPreTraining instance
 546 |                 - a path or url to a pretrained model archive containing:
 547 |                     . `bert_config.json` a configuration file for the model
 548 |                     . `model.chkpt` a TensorFlow checkpoint
 549 |             from_tf: should we load the weights from a locally saved TensorFlow checkpoint
 550 |             cache_dir: an optional path to a folder in which the pre-trained models will be cached.
 551 |             state_dict: an optional state dictionnary (collections.OrderedDict object) to use instead of Google pre-trained models
 552 |             *inputs, **kwargs: additional input for the specific Bert class
 553 |                 (ex: num_labels for BertForSequenceClassification)
 554 |         """
 555 |         if pretrained_model_name_or_path in PRETRAINED_MODEL_ARCHIVE_MAP:
 556 |             archive_file = PRETRAINED_MODEL_ARCHIVE_MAP[pretrained_model_name_or_path]
 557 |         else:
 558 |             archive_file = pretrained_model_name_or_path
 559 |         # redirect to the cache, if necessary
 560 |         try:
 561 |             resolved_archive_file = cached_path(archive_file, cache_dir=cache_dir)
 562 |         except EnvironmentError:
 563 |             logger.error(
 564 |                 "Model name '{}' was not found in model name list ({}). "
 565 |                 "We assumed '{}' was a path or url but couldn't find any file "
 566 |                 "associated to this path or url.".format(
 567 |                     pretrained_model_name_or_path,
 568 |                     ', '.join(PRETRAINED_MODEL_ARCHIVE_MAP.keys()),
 569 |                     archive_file))
 570 |             return None
 571 |         if resolved_archive_file == archive_file:
 572 |             logger.info("loading archive file {}".format(archive_file))
 573 |         else:
 574 |             logger.info("loading archive file {} from cache at {}".format(
 575 |                 archive_file, resolved_archive_file))
 576 |         tempdir = None
 577 |         if os.path.isdir(resolved_archive_file) or from_tf:
 578 |             serialization_dir = resolved_archive_file
 579 |         else:
 580 |             # Extract archive to temp dir
 581 |             tempdir = tempfile.mkdtemp()
 582 |             logger.info("extracting archive file {} to temp dir {}".format(
 583 |                 resolved_archive_file, tempdir))
 584 |             with tarfile.open(resolved_archive_file, 'r:gz') as archive:
 585 |                 archive.extractall(tempdir)
 586 |             serialization_dir = tempdir
 587 |         # Load config
 588 |         config_file = os.path.join(serialization_dir, CONFIG_NAME)
 589 |         config = BertConfig.from_json_file(config_file)
 590 |         logger.info("Model config {}".format(config))
 591 |         # Instantiate model.
 592 |         model = cls(config, *inputs, **kwargs)
 593 |         if state_dict is None and not from_tf:
 594 |             weights_path = os.path.join(serialization_dir, WEIGHTS_NAME)
 595 |             state_dict = torch.load(weights_path, map_location='cpu' if not torch.cuda.is_available() else None)
 596 |         if tempdir:
 597 |             # Clean up temp dir
 598 |             shutil.rmtree(tempdir)
 599 |         if from_tf:
 600 |             # Directly load from a TensorFlow checkpoint
 601 |             weights_path = os.path.join(serialization_dir, TF_WEIGHTS_NAME)
 602 |             return load_tf_weights_in_bert(model, weights_path)
 603 |         # Load from a PyTorch state_dict
 604 |         old_keys = []
 605 |         new_keys = []
 606 |         for key in state_dict.keys():
 607 |             new_key = None
 608 |             if 'gamma' in key:
 609 |                 new_key = key.replace('gamma', 'weight')
 610 |             if 'beta' in key:
 611 |                 new_key = key.replace('beta', 'bias')
 612 |             if new_key:
 613 |                 old_keys.append(key)
 614 |                 new_keys.append(new_key)
 615 |         for old_key, new_key in zip(old_keys, new_keys):
 616 |             state_dict[new_key] = state_dict.pop(old_key)
 617 | 
 618 |         missing_keys = []
 619 |         unexpected_keys = []
 620 |         error_msgs = []
 621 |         # copy state_dict so _load_from_state_dict can modify it
 622 |         metadata = getattr(state_dict, '_metadata', None)
 623 |         state_dict = state_dict.copy()
 624 |         if metadata is not None:
 625 |             state_dict._metadata = metadata
 626 | 
 627 |         def load(module, prefix=''):
 628 |             local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {})
 629 |             module._load_from_state_dict(
 630 |                 state_dict, prefix, local_metadata, True, missing_keys, unexpected_keys, error_msgs)
 631 |             for name, child in module._modules.items():
 632 |                 if child is not None:
 633 |                     load(child, prefix + name + '.')
 634 |         start_prefix = ''
 635 |         if not hasattr(model, 'bert') and any(s.startswith('bert.') for s in state_dict.keys()):
 636 |             start_prefix = 'bert.'
 637 |         load(model, prefix=start_prefix)
 638 |         if len(missing_keys) > 0:
 639 |             logger.info("Weights of {} not initialized from pretrained model: {}".format(
 640 |                 model.__class__.__name__, missing_keys))
 641 |         if len(unexpected_keys) > 0:
 642 |             logger.info("Weights from pretrained model not used in {}: {}".format(
 643 |                 model.__class__.__name__, unexpected_keys))
 644 |         if len(error_msgs) > 0:
 645 |             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
 646 |                                model.__class__.__name__, "\n\t".join(error_msgs)))
 647 |         return model
 648 | 
 649 | 
 650 | class BertModel(BertPreTrainedModel):
 651 |     """BERT model ("Bidirectional Embedding Representations from a Transformer").
 652 | 
 653 |     Params:
 654 |         config: a BertConfig class instance with the configuration to build a new model
 655 | 
 656 |     Inputs:
 657 |         `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
 658 |             with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
 659 |             `extract_features.py`, `run_classifier.py` and `run_squad.py`)
 660 |         `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
 661 |             types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
 662 |             a `sentence B` token (see BERT paper for more details).
 663 |         `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
 664 |             selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
 665 |             input sequence length in the current batch. It's the mask that we typically use for attention when
 666 |             a batch has varying length sentences.
 667 |         `output_all_encoded_layers`: boolean which controls the content of the `encoded_layers` output as described below. Default: `True`.
 668 | 
 669 |     Outputs: Tuple of (encoded_layers, pooled_output)
 670 |         `encoded_layers`: controled by `output_all_encoded_layers` argument:
 671 |             - `output_all_encoded_layers=True`: outputs a list of the full sequences of encoded-hidden-states at the end
 672 |                 of each attention block (i.e. 12 full sequences for BERT-base, 24 for BERT-large), each
 673 |                 encoded-hidden-state is a torch.FloatTensor of size [batch_size, sequence_length, hidden_size],
 674 |             - `output_all_encoded_layers=False`: outputs only the full sequence of hidden-states corresponding
 675 |                 to the last attention block of shape [batch_size, sequence_length, hidden_size],
 676 |         `pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of a
 677 |             classifier pretrained on top of the hidden state associated to the first character of the
 678 |             input (`CLS`) to train on the Next-Sentence task (see BERT's paper).
 679 | 
 680 |     Example usage:
 681 |     ```python
 682 |     # Already been converted into WordPiece token ids
 683 |     input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
 684 |     input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
 685 |     token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
 686 | 
 687 |     config = modeling.BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
 688 |         num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
 689 | 
 690 |     model = modeling.BertModel(config=config)
 691 |     all_encoder_layers, pooled_output = model(input_ids, token_type_ids, input_mask)
 692 |     ```
 693 |     """
 694 |     def __init__(self, config):
 695 |         super(BertModel, self).__init__(config)
 696 |         self.embeddings = BertEmbeddings(config)
 697 |         self.encoder = BertEncoder(config)
 698 |         self.pooler = BertPooler(config)
 699 |         self.apply(self.init_bert_weights)
 700 | 
 701 |     def forward(self, input_ids, token_type_ids=None, attention_mask=None, output_all_encoded_layers=True):
 702 |         if attention_mask is None:
 703 |             attention_mask = torch.ones_like(input_ids)
 704 |         if token_type_ids is None:
 705 |             token_type_ids = torch.zeros_like(input_ids)
 706 | 
 707 |         # We create a 3D attention mask from a 2D tensor mask.
 708 |         # Sizes are [batch_size, 1, 1, to_seq_length]
 709 |         # So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]
 710 |         # this attention mask is more simple than the triangular masking of causal attention
 711 |         # used in OpenAI GPT, we just need to prepare the broadcast dimension here.
 712 |         extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
 713 | 
 714 |         # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
 715 |         # masked positions, this operation will create a tensor which is 0.0 for
 716 |         # positions we want to attend and -10000.0 for masked positions.
 717 |         # Since we are adding it to the raw scores before the softmax, this is
 718 |         # effectively the same as removing these entirely.
 719 |         extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
 720 |         extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
 721 | 
 722 |         embedding_output = self.embeddings(input_ids, token_type_ids)
 723 |         encoded_layers = self.encoder(embedding_output,
 724 |                                       extended_attention_mask,
 725 |                                       output_all_encoded_layers=output_all_encoded_layers)
 726 |         sequence_output = encoded_layers[-1]
 727 |         pooled_output = self.pooler(sequence_output)
 728 |         if not output_all_encoded_layers:
 729 |             encoded_layers = encoded_layers[-1]
 730 |         return encoded_layers, pooled_output
 731 | 
 732 | 
 733 | class BertForPreTraining(BertPreTrainedModel):
 734 |     """BERT model with pre-training heads.
 735 |     This module comprises the BERT model followed by the two pre-training heads:
 736 |         - the masked language modeling head, and
 737 |         - the next sentence classification head.
 738 | 
 739 |     Params:
 740 |         config: a BertConfig class instance with the configuration to build a new model.
 741 | 
 742 |     Inputs:
 743 |         `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
 744 |             with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
 745 |             `extract_features.py`, `run_classifier.py` and `run_squad.py`)
 746 |         `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
 747 |             types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
 748 |             a `sentence B` token (see BERT paper for more details).
 749 |         `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
 750 |             selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
 751 |             input sequence length in the current batch. It's the mask that we typically use for attention when
 752 |             a batch has varying length sentences.
 753 |         `masked_lm_labels`: optional masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length]
 754 |             with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss
 755 |             is only computed for the labels set in [0, ..., vocab_size]
 756 |         `next_sentence_label`: optional next sentence classification loss: torch.LongTensor of shape [batch_size]
 757 |             with indices selected in [0, 1].
 758 |             0 => next sentence is the continuation, 1 => next sentence is a random sentence.
 759 | 
 760 |     Outputs:
 761 |         if `masked_lm_labels` and `next_sentence_label` are not `None`:
 762 |             Outputs the total_loss which is the sum of the masked language modeling loss and the next
 763 |             sentence classification loss.
 764 |         if `masked_lm_labels` or `next_sentence_label` is `None`:
 765 |             Outputs a tuple comprising
 766 |             - the masked language modeling logits of shape [batch_size, sequence_length, vocab_size], and
 767 |             - the next sentence classification logits of shape [batch_size, 2].
 768 | 
 769 |     Example usage:
 770 |     ```python
 771 |     # Already been converted into WordPiece token ids
 772 |     input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
 773 |     input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
 774 |     token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
 775 | 
 776 |     config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
 777 |         num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
 778 | 
 779 |     model = BertForPreTraining(config)
 780 |     masked_lm_logits_scores, seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
 781 |     ```
 782 |     """
 783 |     def __init__(self, config):
 784 |         super(BertForPreTraining, self).__init__(config)
 785 |         self.bert = BertModel(config)
 786 |         self.cls = BertPreTrainingHeads(config, self.bert.embeddings.word_embeddings.weight)
 787 |         self.apply(self.init_bert_weights)
 788 | 
 789 |     def forward(self, input_ids, token_type_ids=None, attention_mask=None, masked_lm_labels=None, next_sentence_label=None):
 790 |         sequence_output, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
 791 |                                                    output_all_encoded_layers=False)
 792 |         prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)
 793 | 
 794 |         if masked_lm_labels is not None and next_sentence_label is not None:
 795 |             loss_fct = CrossEntropyLoss(ignore_index=-1)
 796 |             masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1))
 797 |             next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
 798 |             total_loss = masked_lm_loss + next_sentence_loss
 799 |             return total_loss
 800 |         else:
 801 |             return prediction_scores, seq_relationship_score
 802 | 
 803 | 
 804 | class BertForMaskedLM(BertPreTrainedModel):
 805 |     """BERT model with the masked language modeling head.
 806 |     This module comprises the BERT model followed by the masked language modeling head.
 807 | 
 808 |     Params:
 809 |         config: a BertConfig class instance with the configuration to build a new model.
 810 | 
 811 |     Inputs:
 812 |         `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
 813 |             with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
 814 |             `extract_features.py`, `run_classifier.py` and `run_squad.py`)
 815 |         `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
 816 |             types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
 817 |             a `sentence B` token (see BERT paper for more details).
 818 |         `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
 819 |             selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
 820 |             input sequence length in the current batch. It's the mask that we typically use for attention when
 821 |             a batch has varying length sentences.
 822 |         `masked_lm_labels`: masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length]
 823 |             with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss
 824 |             is only computed for the labels set in [0, ..., vocab_size]
 825 | 
 826 |     Outputs:
 827 |         if `masked_lm_labels` is  not `None`:
 828 |             Outputs the masked language modeling loss.
 829 |         if `masked_lm_labels` is `None`:
 830 |             Outputs the masked language modeling logits of shape [batch_size, sequence_length, vocab_size].
 831 | 
 832 |     Example usage:
 833 |     ```python
 834 |     # Already been converted into WordPiece token ids
 835 |     input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
 836 |     input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
 837 |     token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
 838 | 
 839 |     config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
 840 |         num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
 841 | 
 842 |     model = BertForMaskedLM(config)
 843 |     masked_lm_logits_scores = model(input_ids, token_type_ids, input_mask)
 844 |     ```
 845 |     """
 846 |     def __init__(self, config):
 847 |         super(BertForMaskedLM, self).__init__(config)
 848 |         self.bert = BertModel(config)
 849 |         self.cls = BertOnlyMLMHead(config, self.bert.embeddings.word_embeddings.weight)
 850 |         self.apply(self.init_bert_weights)
 851 | 
 852 |     def forward(self, input_ids, token_type_ids=None, attention_mask=None, masked_lm_labels=None):
 853 |         sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask,
 854 |                                        output_all_encoded_layers=False)
 855 |         prediction_scores = self.cls(sequence_output)
 856 | 
 857 |         if masked_lm_labels is not None:
 858 |             loss_fct = CrossEntropyLoss(ignore_index=-1)
 859 |             masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1))
 860 |             return masked_lm_loss
 861 |         else:
 862 |             return prediction_scores
 863 | 
 864 | 
 865 | class BertForNextSentencePrediction(BertPreTrainedModel):
 866 |     """BERT model with next sentence prediction head.
 867 |     This module comprises the BERT model followed by the next sentence classification head.
 868 | 
 869 |     Params:
 870 |         config: a BertConfig class instance with the configuration to build a new model.
 871 | 
 872 |     Inputs:
 873 |         `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
 874 |             with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
 875 |             `extract_features.py`, `run_classifier.py` and `run_squad.py`)
 876 |         `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
 877 |             types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
 878 |             a `sentence B` token (see BERT paper for more details).
 879 |         `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
 880 |             selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
 881 |             input sequence length in the current batch. It's the mask that we typically use for attention when
 882 |             a batch has varying length sentences.
 883 |         `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size]
 884 |             with indices selected in [0, 1].
 885 |             0 => next sentence is the continuation, 1 => next sentence is a random sentence.
 886 | 
 887 |     Outputs:
 888 |         if `next_sentence_label` is not `None`:
 889 |             Outputs the total_loss which is the sum of the masked language modeling loss and the next
 890 |             sentence classification loss.
 891 |         if `next_sentence_label` is `None`:
 892 |             Outputs the next sentence classification logits of shape [batch_size, 2].
 893 | 
 894 |     Example usage:
 895 |     ```python
 896 |     # Already been converted into WordPiece token ids
 897 |     input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
 898 |     input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
 899 |     token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
 900 | 
 901 |     config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
 902 |         num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
 903 | 
 904 |     model = BertForNextSentencePrediction(config)
 905 |     seq_relationship_logits = model(input_ids, token_type_ids, input_mask)
 906 |     ```
 907 |     """
 908 |     def __init__(self, config):
 909 |         super(BertForNextSentencePrediction, self).__init__(config)
 910 |         self.bert = BertModel(config)
 911 |         self.cls = BertOnlyNSPHead(config)
 912 |         self.apply(self.init_bert_weights)
 913 | 
 914 |     def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None):
 915 |         _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask,
 916 |                                      output_all_encoded_layers=False)
 917 |         seq_relationship_score = self.cls( pooled_output)
 918 | 
 919 |         if next_sentence_label is not None:
 920 |             loss_fct = CrossEntropyLoss(ignore_index=-1)
 921 |             next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
 922 |             return next_sentence_loss
 923 |         else:
 924 |             return seq_relationship_score
 925 | 
 926 | 
 927 | class BertForSequenceClassification(BertPreTrainedModel):
 928 |     """BERT model for classification.
 929 |     """
 930 |     def __init__(self, config, num_labels):
 931 |         super(BertForSequenceClassification, self).__init__(config)
 932 |         self.num_labels = num_labels
 933 |         self.bert = BertModel(config)
 934 |         self.dropout = nn.Dropout(config.hidden_dropout_prob)
 935 |         self.activation = ACT2FN[config.hidden_act]
 936 |         self.activation_final = nn.Tanh()
 937 |         self.classifier = nn.Linear(5*config.hidden_size, num_labels)
 938 |         self.apply(self.init_bert_weights)
 939 | 
 940 | 
 941 |     def forward(self, title_ids, title_type_ids=None, title_mask=None, input_ids=None, token_type_ids=None, attention_mask=None, sdp_ids=None, sdp_type_ids=None, sdp_mask=None, P_gauss1_bs=None, P_gauss2_bs=None, labels=None):
 942 | 
 943 |         input_token_output, pooled_output = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
 944 |         input_token_output = self.dropout(input_token_output)
 945 |         input_token_output = self.activation(input_token_output)
 946 |         
 947 |         _, title_output = self.bert(title_ids, title_type_ids, title_mask, output_all_encoded_layers=False)  
 948 |         title_query = title_output.unsqueeze(dim=1)
 949 |         t_u = torch.matmul(title_query, input_token_output.transpose(-1, -2))
 950 |         t_alpha = nn.Softmax(dim=-1)(t_u)
 951 |         t_v = torch.matmul(t_alpha, input_token_output)
 952 |         output_t = t_v.squeeze(dim=1)
 953 |         output_t = self.activation_final(output_t)
 954 |         output_t = self.dropout(output_t)
 955 | 
 956 |         _, sdp_output = self.bert(sdp_ids, sdp_type_ids, sdp_mask, output_all_encoded_layers=False)
 957 |         sdp_query=sdp_output.unsqueeze(dim=1)
 958 |         u = torch.matmul(sdp_query, input_token_output.transpose(-1, -2))
 959 |         alpha = nn.Softmax(dim=-1)(u)
 960 |         v = torch.matmul(alpha, input_token_output)
 961 |         output_g = v.squeeze(dim=1)
 962 |         output_g = self.activation_final(output_g)
 963 |         output_g = self.dropout(output_g)
 964 | 
 965 |         P_gauss1_bs=P_gauss1_bs.unsqueeze(dim=1)
 966 |         P_gauss2_bs=P_gauss2_bs.unsqueeze(dim=1)
 967 |         gauss_entity1=torch.matmul(P_gauss1_bs, input_token_output)
 968 |         gauss_entity2=torch.matmul(P_gauss2_bs, input_token_output)
 969 |         gauss_entity1=gauss_entity1.squeeze(dim=1)
 970 |         gauss_entity2=gauss_entity2.squeeze(dim=1)
 971 |         gauss_entity1 = self.activation_final(gauss_entity1)
 972 |         gauss_entity2 = self.activation_final(gauss_entity2)
 973 |         gauss_entity1 = self.dropout(gauss_entity1)
 974 |         gauss_entity2 = self.dropout(gauss_entity2)
 975 | 
 976 |         pooled_output = self.dropout(pooled_output)
 977 |         output = torch.cat((output_t,pooled_output,gauss_entity1,gauss_entity2,output_g),-1)
 978 |         output = self.dropout(output)
 979 |         logits = self.classifier(output)
 980 | 
 981 |         if labels is not None:
 982 |             loss_fct = CrossEntropyLoss()
 983 |             loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
 984 |             return loss
 985 |         else:
 986 |             return logits
 987 | 
 988 | 
 989 | class BertForMultipleChoice(BertPreTrainedModel):
 990 |     """BERT model for multiple choice tasks.
 991 |     This module is composed of the BERT model with a linear layer on top of
 992 |     the pooled output.
 993 | 
 994 |     Params:
 995 |         `config`: a BertConfig class instance with the configuration to build a new model.
 996 |         `num_choices`: the number of classes for the classifier. Default = 2.
 997 | 
 998 |     Inputs:
 999 |         `input_ids`: a torch.LongTensor of shape [batch_size, num_choices, sequence_length]
1000 |             with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
1001 |             `extract_features.py`, `run_classifier.py` and `run_squad.py`)
1002 |         `token_type_ids`: an optional torch.LongTensor of shape [batch_size, num_choices, sequence_length]
1003 |             with the token types indices selected in [0, 1]. Type 0 corresponds to a `sentence A`
1004 |             and type 1 corresponds to a `sentence B` token (see BERT paper for more details).
1005 |         `attention_mask`: an optional torch.LongTensor of shape [batch_size, num_choices, sequence_length] with indices
1006 |             selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
1007 |             input sequence length in the current batch. It's the mask that we typically use for attention when
1008 |             a batch has varying length sentences.
1009 |         `labels`: labels for the classification output: torch.LongTensor of shape [batch_size]
1010 |             with indices selected in [0, ..., num_choices].
1011 | 
1012 |     Outputs:
1013 |         if `labels` is not `None`:
1014 |             Outputs the CrossEntropy classification loss of the output with the labels.
1015 |         if `labels` is `None`:
1016 |             Outputs the classification logits of shape [batch_size, num_labels].
1017 | 
1018 |     Example usage:
1019 |     ```python
1020 |     # Already been converted into WordPiece token ids
1021 |     input_ids = torch.LongTensor([[[31, 51, 99], [15, 5, 0]], [[12, 16, 42], [14, 28, 57]]])
1022 |     input_mask = torch.LongTensor([[[1, 1, 1], [1, 1, 0]],[[1,1,0], [1, 0, 0]]])
1023 |     token_type_ids = torch.LongTensor([[[0, 0, 1], [0, 1, 0]],[[0, 1, 1], [0, 0, 1]]])
1024 |     config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
1025 |         num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
1026 | 
1027 |     num_choices = 2
1028 | 
1029 |     model = BertForMultipleChoice(config, num_choices)
1030 |     logits = model(input_ids, token_type_ids, input_mask)
1031 |     ```
1032 |     """
1033 |     def __init__(self, config, num_choices):
1034 |         super(BertForMultipleChoice, self).__init__(config)
1035 |         self.num_choices = num_choices
1036 |         self.bert = BertModel(config)
1037 |         self.dropout = nn.Dropout(config.hidden_dropout_prob)
1038 |         self.classifier = nn.Linear(config.hidden_size, 1)
1039 |         self.apply(self.init_bert_weights)
1040 | 
1041 |     def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None):
1042 |         flat_input_ids = input_ids.view(-1, input_ids.size(-1))
1043 |         flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1))
1044 |         flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1))
1045 |         _, pooled_output = self.bert(flat_input_ids, flat_token_type_ids, flat_attention_mask, output_all_encoded_layers=False)
1046 |         pooled_output = self.dropout(pooled_output)
1047 |         logits = self.classifier(pooled_output)
1048 |         reshaped_logits = logits.view(-1, self.num_choices)
1049 | 
1050 |         if labels is not None:
1051 |             loss_fct = CrossEntropyLoss()
1052 |             loss = loss_fct(reshaped_logits, labels)
1053 |             return loss
1054 |         else:
1055 |             return reshaped_logits
1056 | 
1057 | 
1058 | class BertForTokenClassification(BertPreTrainedModel):
1059 |     """BERT model for token-level classification.
1060 |     This module is composed of the BERT model with a linear layer on top of
1061 |     the full hidden state of the last layer.
1062 | 
1063 |     Params:
1064 |         `config`: a BertConfig class instance with the configuration to build a new model.
1065 |         `num_labels`: the number of classes for the classifier. Default = 2.
1066 | 
1067 |     Inputs:
1068 |         `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
1069 |             with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
1070 |             `extract_features.py`, `run_classifier.py` and `run_squad.py`)
1071 |         `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
1072 |             types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
1073 |             a `sentence B` token (see BERT paper for more details).
1074 |         `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
1075 |             selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
1076 |             input sequence length in the current batch. It's the mask that we typically use for attention when
1077 |             a batch has varying length sentences.
1078 |         `labels`: labels for the classification output: torch.LongTensor of shape [batch_size, sequence_length]
1079 |             with indices selected in [0, ..., num_labels].
1080 | 
1081 |     Outputs:
1082 |         if `labels` is not `None`:
1083 |             Outputs the CrossEntropy classification loss of the output with the labels.
1084 |         if `labels` is `None`:
1085 |             Outputs the classification logits of shape [batch_size, sequence_length, num_labels].
1086 | 
1087 |     Example usage:
1088 |     ```python
1089 |     # Already been converted into WordPiece token ids
1090 |     input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
1091 |     input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
1092 |     token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
1093 | 
1094 |     config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
1095 |         num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
1096 | 
1097 |     num_labels = 2
1098 | 
1099 |     model = BertForTokenClassification(config, num_labels)
1100 |     logits = model(input_ids, token_type_ids, input_mask)
1101 |     ```
1102 |     """
1103 |     def __init__(self, config, num_labels):
1104 |         super(BertForTokenClassification, self).__init__(config)
1105 |         self.num_labels = num_labels
1106 |         self.bert = BertModel(config)
1107 |         self.dropout = nn.Dropout(config.hidden_dropout_prob)
1108 |         self.classifier = nn.Linear(config.hidden_size, num_labels)
1109 |         self.apply(self.init_bert_weights)
1110 | 
1111 |     def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None):
1112 |         sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
1113 |         sequence_output = self.dropout(sequence_output)
1114 |         logits = self.classifier(sequence_output)
1115 | 
1116 |         if labels is not None:
1117 |             loss_fct = CrossEntropyLoss()
1118 |             # Only keep active parts of the loss
1119 |             if attention_mask is not None:
1120 |                 active_loss = attention_mask.view(-1) == 1
1121 |                 active_logits = logits.view(-1, self.num_labels)[active_loss]
1122 |                 active_labels = labels.view(-1)[active_loss]
1123 |                 loss = loss_fct(active_logits, active_labels)
1124 |             else:
1125 |                 loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1126 |             return loss
1127 |         else:
1128 |             return logits
1129 | 
1130 | 
1131 | class BertForQuestionAnswering(BertPreTrainedModel):
1132 |     """BERT model for Question Answering (span extraction).
1133 |     This module is composed of the BERT model with a linear layer on top of
1134 |     the sequence output that computes start_logits and end_logits
1135 | 
1136 |     Params:
1137 |         `config`: a BertConfig class instance with the configuration to build a new model.
1138 | 
1139 |     Inputs:
1140 |         `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length]
1141 |             with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts
1142 |             `extract_features.py`, `run_classifier.py` and `run_squad.py`)
1143 |         `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token
1144 |             types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to
1145 |             a `sentence B` token (see BERT paper for more details).
1146 |         `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices
1147 |             selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max
1148 |             input sequence length in the current batch. It's the mask that we typically use for attention when
1149 |             a batch has varying length sentences.
1150 |         `start_positions`: position of the first token for the labeled span: torch.LongTensor of shape [batch_size].
1151 |             Positions are clamped to the length of the sequence and position outside of the sequence are not taken
1152 |             into account for computing the loss.
1153 |         `end_positions`: position of the last token for the labeled span: torch.LongTensor of shape [batch_size].
1154 |             Positions are clamped to the length of the sequence and position outside of the sequence are not taken
1155 |             into account for computing the loss.
1156 | 
1157 |     Outputs:
1158 |         if `start_positions` and `end_positions` are not `None`:
1159 |             Outputs the total_loss which is the sum of the CrossEntropy loss for the start and end token positions.
1160 |         if `start_positions` or `end_positions` is `None`:
1161 |             Outputs a tuple of start_logits, end_logits which are the logits respectively for the start and end
1162 |             position tokens of shape [batch_size, sequence_length].
1163 | 
1164 |     Example usage:
1165 |     ```python
1166 |     # Already been converted into WordPiece token ids
1167 |     input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]])
1168 |     input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]])
1169 |     token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]])
1170 | 
1171 |     config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
1172 |         num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072)
1173 | 
1174 |     model = BertForQuestionAnswering(config)
1175 |     start_logits, end_logits = model(input_ids, token_type_ids, input_mask)
1176 |     ```
1177 |     """
1178 |     def __init__(self, config):
1179 |         super(BertForQuestionAnswering, self).__init__(config)
1180 |         self.bert = BertModel(config)
1181 |         # TODO check with Google if it's normal there is no dropout on the token classifier of SQuAD in the TF version
1182 |         # self.dropout = nn.Dropout(config.hidden_dropout_prob)
1183 |         self.qa_outputs = nn.Linear(config.hidden_size, 2)
1184 |         self.apply(self.init_bert_weights)
1185 | 
1186 |     def forward(self, input_ids, token_type_ids=None, attention_mask=None, start_positions=None, end_positions=None):
1187 |         sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
1188 |         logits = self.qa_outputs(sequence_output)
1189 |         start_logits, end_logits = logits.split(1, dim=-1)
1190 |         start_logits = start_logits.squeeze(-1)
1191 |         end_logits = end_logits.squeeze(-1)
1192 | 
1193 |         if start_positions is not None and end_positions is not None:
1194 |             # If we are on multi-GPU, split add a dimension
1195 |             if len(start_positions.size()) > 1:
1196 |                 start_positions = start_positions.squeeze(-1)
1197 |             if len(end_positions.size()) > 1:
1198 |                 end_positions = end_positions.squeeze(-1)
1199 |             # sometimes the start/end positions are outside our model inputs, we ignore these terms
1200 |             ignored_index = start_logits.size(1)
1201 |             start_positions.clamp_(0, ignored_index)
1202 |             end_positions.clamp_(0, ignored_index)
1203 | 
1204 |             loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
1205 |             start_loss = loss_fct(start_logits, start_positions)
1206 |             end_loss = loss_fct(end_logits, end_positions)
1207 |             total_loss = (start_loss + end_loss) / 2
1208 |             return total_loss
1209 |         else:
1210 |             return start_logits, end_logits
1211 | 


--------------------------------------------------------------------------------
/model.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CongSun-dlut/CPI_extraction/cda9f2b7f6fb757b999744cc9a95be205e2cc265/model.jpg


--------------------------------------------------------------------------------