├── .gitignore ├── README.md ├── data ├── dev.key ├── dev.txt ├── pretrained-embeds.pkl ├── test-hidden.txt ├── train.txt └── vocab.txt ├── gtnlplib ├── __init__.py ├── constants.py ├── data_tools.py ├── evaluation.py ├── feat_extractors.py ├── neural_net.py ├── parsing.py └── utils.py ├── make-submission.sh ├── pset4.ipynb ├── tests └── test_parser.py └── text-answers.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.tar.gz 3 | *.ipynb_checkpoints 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Dependency Parsing in Pytorch 2 | 3 | This problem set walks through the implementation of a high-quality dependency parser in Pytorch. 4 | There are comments and hints abound, and each deliverable has a specific unit test that can be run with nosetests. 5 | Please do not post solutions to the problems on the internet. 6 | -------------------------------------------------------------------------------- /data/test-hidden.txt: -------------------------------------------------------------------------------- 1 | Trading was heavy at about one billion shares , compared with ###.# million Friday . 2 | But the session was orderly , in contrast to the market 's four-day closure after the #### crash . 3 | Richard Chenevix-Trench , a director at Hong Kong-based Baring International Fund Managers Ltd. , said the market probably has n't hit bottom yet but is close . 4 | In Australia , Sydney 's All Ordinaries index closed at ####.# , down #.# % , its biggest drop since October #### . 5 | London 's Financial Times-Stock Exchange ###-share index , the most closely watched market barometer , ended at its intraday high of ####.# , down ##.# , or #.# % . 6 | At its low , shortly before Wall Street opened , it was off more than ### points . 7 | The Financial Times ##-share index closed ##.# points lower at ####.# . 8 | Volume more than doubled to ###.# million shares from ###.# million Friday . 9 | Prices on the Frankfurt Stock Exchange tumbled in heavy trading . 10 | The decline in the German Stock Index of ###.## points , or ##.# % , to ####.## was the Frankfurt market 's steepest fall ever . 11 | Retail investors dumped holdings on a massive scale , pushing some blue-chip shares down as much as ## % . 12 | To make them directly comparable , each index is based on the close of #### equaling ### . 13 | The percentage change is since year-end . 14 | We 've always thought that Mr. Wright underestimated California 's vitality , but maybe the state 's la-la factions are starting to overwhelm the forces that made it such a significant place . 15 | Interestingly , the Environmental Defense Fund is having nothing to do with this one . 16 | Not only Californians but all Americans would pay if this thing passed . 17 | The initiative bars the sale of any crops in California that do n't meet the initiative 's standards . 18 | Kansas wheat farmers and Florida fruit growers would have to adjust or give up the California market . 19 | In other words , California is presuming to take control of the nation 's farm policy . 20 | Consider the greenhouse-effect provision . 21 | Even if one buys into the whole greenhouse theory , it is inconceivable that reductions in a single state could have any impact on what is billed as a global problem . 22 | But if rational science and economics have nothing to do with the new environment initiative , what is going on ? 23 | The key here is the ambition of state Attorney General John Van de Kamp . 24 | The initiative seems to have been crafted to include all the hot issues that set off the wealthy Hollywood weepers who donate money . 25 | And it allows Mr. Van de Kamp to get around campaign spending limits . 26 | This initiative is being labeled The Big Green , but maybe it should be called The Big Greenback . 27 | -LRB-- The Republican candidate , Sen. Pete Wilson , is playing the initiative fundraising game too , sponsoring his own crime initiative . -RRB-- 28 | While it is possible that the Big Green initiative will be ruled unconstitutional , it is of course conceivable that in modern California it could slide through . 29 | This is the state that recently passed the Prop. ## anti-toxic initiative . 30 | If this new proposal ever does become law , the green lobby will benefit directly . 31 | The initiative creates a free floating state environmental officer to sue companies or government agencies that do things he does n't like . 32 | That means the NRDC and such groups no longer would have to spend as much money on litigation ; taxpayers would bear the cost . 33 | Of course the state 's liberals are not yet a nation unto themselves . 34 | Then exhilaration . 35 | Spurred by waves of large-scale buying in blue-chip stocks , the Dow Jones Industrial Average rallied yesterday and erased about a half of Friday 's ###.##-point plunge , gaining ##.## to ####.## . 36 | While the advance cheered investors who feared a ####-style crash would occur yesterday , it was strictly a big-stock rally fed by huge buying by bargain-hunting institutions and program traders . 37 | The Nasdaq OTC index closed down #.## to ###.## . 38 | Transports plunged on takeover disappointments in two airline stocks , UAL and AMR , which each fell more than ## % when they reopened for trading yesterday after being suspended Friday afternoon . 39 | Overall , `` this is a pleasant rally but it 's very selective , '' said Arthur Cashin Jr. , a veteran PaineWebber Inc. trader at the Big Board . 40 | It 's just a strange feeling . 41 | I do n't think anyone left the place whistling Dixie . '' 42 | But to traders , it looked like disaster on the #:## a.m. opening bell . 43 | At #:## , Procter & Gamble -- one of the most important Dow bellwethers of late -- opened down # #/# to ### . 44 | The Dow dropped to a quick ##-point loss , and to many traders it looked as if stocks were headed for yet another big tumble . 45 | Then , to make matters worse , computerized sell programs kicked in , hammering stocks into steeper losses . 46 | There was heavy stock-index arbitrage , as traders sold big baskets of stock and bought stock-index futures to profit from the price discrepancies between the two markets . 47 | This was a hangover from Friday , when Standard & Poor 's ###-stock index futures had closed at a sharp discount to stocks . 48 | They did n't . 49 | The Dow accelerated its slide , losing ##.## in the first ## minutes of trading . 50 | Then at ##:## the Dow suddenly started to rebound , and when it shot upward it did so even faster than the early-morning fall . 51 | All the selling had pushed stocks to such cheap values that big investment banks and major money management firms started buying stocks heavily . 52 | Trading in Walt Disney Co. particularly caught traders ' eyes . 53 | According to Big Board officials , Disney had one of the biggest sell-order imbalances on Friday ; it was one of the seven stocks that could n't finish trading that day . 54 | But then it shot upward # #/# as Goldman , Sachs & Co. stepped in and bought , traders said . 55 | Around Wall Street , trading desks were relieved that they could at least play the market yesterday , in contrast to Friday 's gridlock . 56 | We put some orders together . 57 | There was n't a lot of panic selling , either domestically or internationally ... . 58 | Not like Friday where they just took -LCB-- the market -RCB-- apart . '' 59 | Grinned Griffith Peck , a trader in Shearson Lehman Hutton Inc. 's OTC department : `` I tell you , this market acts healthy . '' 60 | Around him , scores of traders seemed to get a burst of energy ; their boss broke out bottles of Perrier water to cool them off . 61 | Among Big Board specialists , the cry was `` Pull your offers '' -- meaning that specialists soon expected to get higher prices for their shares . 62 | But not everybody was making money . 63 | The carnage on the Chicago Board Options Exchange , the nation 's major options market , was heavy after the trading in S&P ### stock-index options was halted Friday . 64 | `` They were absolutely killed , slaughtered , '' said one Chicago-based options trader . 65 | Some traders said that the closely watched Major Market Index , whose ## stocks mimic the Dow industrials , did n't lead yesterday 's big rally . 66 | Then when the market was at a technical level to buy , they came in with a vengeance . '' 67 | However , according to one analyst , the timing of Major Market Index futures buying just before the turnaround was similar to that of Terrible Tuesday . 68 | `` Futures were pulling the stock market higher , '' said Donald Selkin , head of stock-index futures research at Prudential-Bache Securities Inc . 69 | Specialists were criticized for their inability to maintain orderly markets during the Friday plunge . 70 | According to a Big Board official , while many stocks opened late , there were subsequent trading halts in only three issues -- AMR , Merck and Valero Energy . 71 | Merck is one of the most important stocks in the Major Market Index . 72 | No sector of the market has been spared during the past two days ' gyrations . 73 | For example , Halliburton jumped # #/# to ## , Schlumberger rose # #/# to ## #/# and Baker Hughes rose # #/# to ## . 74 | Because of the UAL and AMR tumbles , airlines were the weakest sector of the market yesterday . 75 | Philip Morris was the Big Board 's most active issue , rising # #/# to ## #/# on nearly eight million shares . 76 | Merrill Lynch added # #/# to ## , PaineWebber rose #/# to ## #/# and Bear Stearns rose #/# to ## #/# . 77 | Federal National Mortgage Association , a recently hot stock , climbed # to ### on nearly #.# million shares . 78 | Mr. Phelan said that program trading strategies were n't responsible for triggering Friday 's decline despite a jump in the use of the computer-driven strategies in recent months . 79 | Mr. Phelan expressed relief that the market rebounded yesterday . 80 | `` It was the opposite of what happened on Oct. ## . 81 | They used their judgment . 82 | Instead , they bought on weakness and sold into the strength , which kept the market orderly . 83 | Maybe they learned from experience . '' 84 | Wall Street traders on Friday had complained about the trading suspensions . 85 | James A. White and Sonja Steptoe contributed to this article . 86 | West Germany 's Green Party joined its ideological soulmates Jeremy Rifkin and the Christic Institute in the legal battle to ground the Atlantis shuttle and its plutonium-powered Galileo probe to Jupiter . 87 | Of course it was . 88 | NASA should now sue for fines against all three politico-plaintiffs , foreign and domestic , for bringing this mischievous case . 89 | A House-Senate conference approved a permanent smoking ban on all domestic airline routes within the continental U.S. and on all flights of six hours or less to Alaska and Hawaii . 90 | The restrictions would cover all but a small percentage of domestic air traffic , and represent a major expansion of the current smoking ban on flights of two hours or less . 91 | From the outset , the tobacco industry has been uncertain as to what strategy to follow . 92 | `` Anybody can vote as they want , '' said Rep. William Lehman -LRB-- D. , Fla. -RRB-- , head of the House conferees . 93 | Leery of the costs -- and , critics say , competition -- the airlines have sought to gain leverage over the city of Denver . 94 | The industry sought to impose conditions that would have delayed funds for the project until Denver and the airlines had agreed to leases for ## % of the gates . 95 | Congress previously cut six airports this year . 96 | And federal-formula grants for mass transit would be effectively frozen at $ #.### billion , or $ ## million more than last fiscal year . 97 | Sony two weeks ago agreed to acquire Columbia for $ #.# billion , or $ ## a share . 98 | Warner sued Sony and Guber-Peters late last week ; Sony and Guber-Peters have countersued , charging Warner with attempting to interfere in Sony 's acquisition of the two companies . 99 | Guber-Peters 's net income in the latest quarter compared with a net loss of $ #.# million , or ## cents a share , in the year-earlier period . 100 | Officials at Aristech , based in Pittsburgh , declined comment . 101 | Congress has been critical of the Bush administration for not sending enough aid to Poland , so it is getting ready to send its own version of a CARE package . 102 | Last month , the Senate voted to send a delegation of congressional staffers to Poland to assist its legislature , the Sejm , in democratic procedures . 103 | Senator Pete Domenici calls this effort `` the first gift of democracy . '' 104 | Maybe after the staffers explain their work to the Poles , they 'd be willing to come back and do the same for the American people . 105 | In a sharply weaker London market yesterday , Waterford shares were down ## pence at ## pence -LRB-- ## cents -RRB-- . 106 | There were n't any extraordinary items . 107 | Sales for the total group rose ## % to ###.# million Irish punts compared with ###.# million Irish punts a year ago . 108 | Waterford has decided against paying an interim dividend . 109 | Waterford said the appointment of a new management team and the signing of a comprehensive labor agreement are expected to enhance the company 's long-term prospects . 110 | The sudden `` flight to quality '' that triggered Friday 's explosive bond-market rally was reversed yesterday in a `` flight from quality '' rout . 111 | `` It was a pretty wild day . 112 | Our markets were closely tied to the stock market , '' said Joel Kazis , manager of trading at Smith Barney , Harris Upham & Co . 113 | `` Friday 's flight to quality was no longer needed once the stock market found its legs , '' he said . 114 | That caused investors to flee stocks and buy high-quality Treasury bonds , which are safer than other types of securities . 115 | Contributing to the selling pressure were dispatches by several investment firms advising clients to boost their stock holdings and reduce the size of their cash or bond portfolios . 116 | The rate is considered an early signal of changes in Fed policy . 117 | In fact , some economists contend that the latest easing started last week . 118 | The Treasury 's benchmark ##-year bond , ended about # #/# points lower , or down about $ ##.## for each $ #,### face amount . 119 | After Treasury bill rates plummeted as much as #.## percentage point on Friday , they gave back three-fourths of that amount yesterday . 120 | The bond-equivalent yield on three-month Treasury bills , for example , was quoted late yesterday at #.## % , compared with #.## % Friday . 121 | Investment-grade corporate bonds , mortgage-backed securities and municipal bonds also fell . 122 | But prices of junk bonds , which were battered Friday in near standstill trading , rebounded to post small gains after a volatile trading session . 123 | Junk bonds opened as much as four points lower but staged a modest comeback as stock prices firmed . 124 | Some traders said the high-yield market was helped by active institutional buying . 125 | The benchmark ##-year bond , for example rose one point in early Japanese trading in reaction to a quick ###-point drop in the Tokyo stock market . 126 | But as Japanese stocks rebounded , Treasurys retreated and ended just modestly higher . 127 | Many U.S. trading operations , wanting to keep a watchful eye on Japanese trading as an indication of where U.S. trading would begin , were fully staffed during the Tokyo trading session . 128 | `` Most of the action was during the night session , '' said Michael Moore , trading manager at Continental Bank . 129 | Jay Goldinger , who often trades overnight for Capital Insight Inc. , Beverly Hills , Calif. , said trading in Tokyo was `` very active '' but highly volatile . 130 | In Tokyo , trading is halted during lunchtime . 131 | The August trade deficit is expected to have widened to $ #.# billion from $ #.## billion in July . 132 | A widening of that magnitude , said one New York trader , is `` not a favorable number ... . 133 | It could do damage to us . '' 134 | Resolution Funding is a division of Resolution Trust Corp. , the new federal agency created to bail out the nation 's troubled thrifts . 135 | `` There 's lots of supply , '' the New York trader said . 136 | `` We have a couple or three tough weeks coming . '' 137 | The benchmark ##-year Treasury bond was quoted late at a price of ### ##/## , compared with a closing price of ### ##/## Friday . 138 | The yield on the benchmark issue rose to #.## % from #.## % . 139 | Short-term interest rates fell yesterday at the government 's weekly Treasury bill auction . 140 | The average discount rate was #.## % on new six-month bills , the lowest since the average of #.## % at the auction on July ## , #### . 141 | Here are auction details : 142 | Rates are determined by the difference between the purchase price and face value . 143 | Thus , higher bidding narrows the investor 's return while lower bidding widens it . 144 | Corporate Issues 145 | Investment-grade corporate bonds ended one to # #/# point lower . 146 | Foreign bonds surged as the dollar weakened against most major currencies . 147 | The yield was #.### % . 148 | Mortgage securities gave up most of Friday 's gains as active issues ended ##/## to ##/## point lower . 149 | On Friday , mortgage issues gained as much as # #/## . 150 | Late yesterday Ginnie Mae # % securities were yielding #.## % to a ##-year average life assumption , as the spread above the Treasury ##-year note narrowed #.## percentage point to #.## . 151 | Municipals 152 | Rebounding stocks and weaker Treasury prices drove municipal bonds #/# to #/# point lower in late dealings . 153 | The session losses left municipal dollar bonds close to where they were before the ###.##-point drop in the Dow Jones Industrial Average Friday prompted a capital markets rally . 154 | Professionals dominated municipal trading throughout the session . 155 | New Jersey Turnpike Authority 's #.## % issue of #### was off #/# at ## #/# bid , yielding #.## % , up #.## percentage point from late Friday . 156 | Florida Board of Education 's # #/# % issue of #### was #/# point weaker at ## #/# bid . 157 | The # #/# % issue of Triborough Bridge and Tunnel Authority of New York , due #### , was off #/# at ## #/# bid . 158 | ADT , a security services and auctions company , trades on London 's Stock Exchange . 159 | Laidlaw is ##%-controlled by Canadian Pacific Ltd. , a Montreal transportation , resources and industrial holding concern . 160 | Pershare net fell to ###.# yen from ###.# yen because of expenses and capital adjustments . 161 | Domestic leisure sales , however , were lower . 162 | Hertz Corp. of Park Ridge , N.J. , said it retained Merrill Lynch Capital Markets to sell its Hertz Equipment Rental Corp. unit . 163 | `` We are only going to sell at the right price . '' 164 | Hertz Equipment had operating profit before depreciation of $ ## million on revenue of $ ### million in #### . 165 | The closely held Hertz Corp. had annual revenue of close to $ # billion in #### , of which $ #.# billion was contributed by its Hertz Rent A Car operations world-wide . 166 | It supplies commercial and industrial equipment including earth-moving , aerial , compaction and electrical equipment , compressors , cranes , forklifts and trucks . 167 | In the year-earlier quarter , the company reported net income of $ ###,### , or ## cents a share . 168 | In over-the-counter trading , Interspec fell ##.# cents to $ #.## . 169 | In the first nine months of #### , net was $ ## million , or $ #.## a share . 170 | Mr. Simmons said the third-quarter results reflect continued improvements in productivity and operating margins . 171 | U.S. Banknote said it believes the sale , if completed , apparently would satisfy antitrust issues raised by the U.S. Justice Department about U.S. Banknote 's offer to buy International Banknote . 172 | Both of the New York-based companies print stock certificates and currency . 173 | It also said the tender offer would probably have to be extended further to complete financing arrangements . 174 | The offer , made June # , has been extended several times . 175 | U.S. Banknote said that as of Oct. ## , ##.# million shares , or about ##.# % of the fully diluted shares outstanding , had been tendered . 176 | Gitano Group Inc. said it agreed to buy ## % of Regatta Sport Ltd. , a closely held apparel maker , with the assumption of $ # million of contingent debt . 177 | That ## % is now held by Clifford Parker , Regatta 's president and chief executive officer , who will continue to manage Regatta 's operations under Gitano . 178 | In #### , Regatta will have sales `` in excess of $ ## million '' and will show a profit , Mr. Parker said . 179 | Gitano , which makes budget-priced apparel sold mainly through mass merchandisers like K mart and Wal-Mart , said the Regatta acquisition will enhance its strategy to expand into department stores . 180 | This fall , Gitano began manufacturing moderately priced clothes aimed at department stores under the Gloria Vanderbilt trademark , which Gitano recently acquired . 181 | Those results included a $ #.# million charge related to the retirement of debt . 182 | Enron said each unit will be priced in the $ ##-to-$## range and will represent about ## % of the partnership equity . 183 | Arthur M. Goldberg said he extended his unsolicited tender offer of $ ## a share tender offer , or $ ###.# million , for Di Giorgio Corp. to Nov. # . 184 | DIG Acquisition Corp. , the New Jersey investor 's acquisition vehicle , said that as of the close of business yesterday , ###,### shares had been tendered . 185 | Including the stake DIG already held , DIG holds a total of about ## % of Di Giorgio 's shares on a fully diluted basis . 186 | The offer , which also includes common and preferred stock purchase rights , was to expire last night at midnight . 187 | The new expiration date is the date on which DIG 's financing commitments , which total about $ ### million , are to expire . 188 | DIG is a unit of DIG Holding Corp. , a unit of Rose Partners L.P . 189 | In New York Stock Exchange composite trading yesterday , Di Giorgio closed at $ ##.## a share , down $ #.## . 190 | A. manual typewriters , B. black-and-white snapshots , C. radio adventure shows . 191 | If you guessed black-and-white snapshots , you 're right . 192 | After years of fading into the background , two-tone photography is coming back . 193 | Trendy magazine advertisements feature stark black-and-white photos of Hollywood celebrities pitching jeans , shoes and liquor . 194 | Portrait studios accustomed to shooting only in color report a rush to black-and-white portrait orders . 195 | What 's happening in photography mirrors the popularity of black and white in fashion , home furnishings and cinematography . 196 | On Seventh Avenue , designers have been advancing the monochrome look with clothing collections done entirely in black and white . 197 | And classic black-and-white movies are enjoying a comeback on videocassette tapes , spurred , in part , by the backlash against colorization of old films . 198 | `` The pendulum is swinging back to black and white , '' says Richard DeMoulin , the general manager of Eastman Kodak Co. 's professional photography division . 199 | Until two years ago , sales of black-and-white film had been declining steadily since the ####s . 200 | Aimed at commercial photographers , the film can be used in very low light without sacrificing quality , says Donald Franz of Photofinishing Newsletter . 201 | Agfa recently signed Olympic gold medalist Florence Griffith-Joyner to endorse a new line of black-and-white paper that 's geared to consumers and will compete directly with Kodak 's papers . 202 | The biggest beneficiary of the black-and-white revival is likely to be International Paper Co. 's Ilford division , known in the industry for its premium products . 203 | Sales of Ilford 's four varieties of black-and-white film this year are outpacing growth in the overall market , although the company wo n't say by exactly how much . 204 | `` We hope the trend lasts , '' says Laurie DiCara , Ilford 's marketing communications director . 205 | `` It has an archival , almost nostalgic quality to it , '' says Owen B. Butler , the chairman of the applied photography department at Rochester Institute of Technology . 206 | `` You can shift out of reality with black and white , '' he adds . 207 | Such features have been especially attractive to professional photographers and marketing executives , who have been steadily increasing their use of black and white in advertising . 208 | Consider Gap Inc. , whose latest ad campaign features black-and-white shots of Hollywood stars , artists and other well-known personalities modeling the retailer 's jeans and T-shirts . 209 | Richard Crisman , the account manager for the campaign , says Gap did n't intentionally choose black and white to distinguish its ads from the color spreads of competitors . 210 | `` We wanted to highlight the individual , not the environment , '' he says , `` and black and white allows you to do that better than color . '' 211 | The campaign won a Cleo award as this year 's best ad by a specialty retailer . 212 | Even food products and automobiles , which have long depended on color , are making the switch . 213 | Other companies that are currently using two-tone ads include American Express Co. and Epson America Inc . 214 | Portrait studios have also latched onto the trend . 215 | His On-Broadway Photography studio in Portland , Ore. , doubled its business last year and , he says , is booked solid for the next five . 216 | One customer , Dayna Brunsdon , says she spurned a color portrait for black and white because `` it 's more dramatic . 217 | I show it to my friends , and they all say ` wow . ' 218 | It is n't ordinary like color . '' 219 | Still , most consumers are n't plunking black-and-white film into their cameras to take family snapshots . 220 | Typically , it must be mailed to a handful of processors and may take a week or more to be processed and returned . 221 | But for photofinishers , developing costs for black-and-white film are higher . 222 | Some companies are starting to tackle that problem . 223 | Ilford , for example , recently introduced a black-and-white film that can be processed quickly by color labs . 224 | Intent on wooing customers , the company is also increasing its sponsorship of black-and-white photography classes . 225 | Similarly , Agfa is sponsoring scores of photography contests at high schools and colleges , offering free black-and-white film and paper as prizes . 226 | And Kodak is distributing an instructional video to processors on how to develop its monochrome film more efficiently . 227 | Other companies are introducing related products . 228 | Charles Beseler Co. , a leading maker of photographic enlargers , introduced last month a complete darkroom starter kit targeted at teen-agers who want to process their own black-and-white photographs . 229 | But some industry observers believe the resurgence of black and white is only a fad . 230 | They cite the emergence of still electronic photography , more newspapers turning to color on their pages and measurable improvements in the quality of color prints . 231 | `` Black and white has n't made the same quantum leaps in technological development as color , '' says Mr. Butler of the Rochester Institute . 232 | `` The color print today is far superior to prints of ## years ago . 233 | `` It 's got a classic spirit and carries over emotionally , '' says Alfred DeBat of Professional Photographers of America . 234 | `` That 's the appeal . 235 | Sales rose more than # % to $ ##.# million from $ ##.# million . 236 | The Sacramento , Calif. , company also attributed improved performance to a lower effective tax rate and higher interest income . 237 | Sales grew almost # % to $ ###.# million from $ ###.# million . 238 | McClatchy publishes the Sacramento -LRB-- Calif . -RRB-- Bee and Tacoma -LRB-- Wash . -RRB-- News Tribune , and other papers in Western states . 239 | Agip S.p . A. and Societe National Elf Aquitaine , the state oil companies of Italy and France , respectively , submitted an offer to buy Gatoil Suisse S.A . 240 | The price was n't disclosed . 241 | A spokesman for Gatoil said that the Swiss oil concern was examining the offer , submitted last Friday , along with two other offers , also submitted last week . 242 | Those two offers were private and the spokesman refused to identify the bidding companies . 243 | The spokesman further said that at least two more offers are expected from other companies within two weeks . 244 | The latest reading of ###.# was up from ###.# in July and ###.# as recently as March . 245 | The August rise marked the fifth straight monthly gain for the indicator , which uses the #### average as a base of ### . 246 | In contrast , the Commerce Department 's widely followed index of leading indicators , while up in August , has fallen repeatedly since reaching a high early this year . 247 | But the far stronger showing of the Columbia index `` makes a recession any time soon highly unlikely , '' says Geoffrey H. Moore , the director of the Columbia facility . 248 | The group normally convenes only when a change in the economy 's general course seems likely . 249 | `` No meeting is scheduled because the expansion shows no sign of going off the tracks , '' Mr. Moore reports . 250 | The comparable lead times for the Commerce index , whose components include the stock market , are far shorter -- ## months before recessions and only three months before recoveries . 251 | `` It was an entirely different pattern from what we 're seeing now , '' Mr. Moore says . 252 | The stock market has lost some precursory power , analysts at the Columbia center claim , because of the growing impact of international developments . 253 | BSN S.A. , a leading French food group , said it agreed to acquire Birkel G.m.b . H. , a West German pasta maker . 254 | The value of the acquisition was n't disclosed . 255 | The move is in line with BSN 's strategy of gradually building its share of the European pasta market through external growth . 256 | BSN will initially acquire a ## % interest in Birkel , a closely held concern . 257 | The French group has an agreement giving it the right to buy all the shares outstanding , and this could be completed within a few months , a BSN spokeswoman said . 258 | The acquisition strengthens BSN 's position in the European pasta market . 259 | The French group currently ranks second after Barilla Group of Italy , whose sales are chiefly in the Italian market . 260 | The agency said it confirmed American Continental 's preferred stock rating at C . 261 | CENTRUST SAVINGS BANK -LRB-- Miami -RRB-- -- 262 | The rating agency also reduced the ratings for long-term deposits to B-# from Ba-# and for preferred stock to Ca from Caa . 263 | THE STOCK MARKET AVOIDED a repeat of Black Monday as prices recovered from an early slide , spurred by bargain-hunting institutions and program traders . 264 | The rally erased about half of Friday 's ###.##-point plunge , but analysts are cautious about the market 's outlook . 265 | The dollar also rebounded , while bond prices plummeted and Treasury bill rates soared . 266 | Junk bonds also recovered somewhat , though trading remained stalled . 267 | Gold also rose . 268 | Tokyo stock prices bounced back in early trading Tuesday following a #.# % plunge on Monday . 269 | The dollar also moved higher in Tokyo . 270 | AMR slid $ ##.### , to $ ##.## . 271 | Also , a UAL group tried to get financing for a lower bid , possibly $ ### a share . 272 | UAL fell $ ##.### , to $ ###.### . 273 | IBM 's earnings tumbled ## % in the third quarter , slightly more than expected . 274 | The computer giant partly cited a stronger dollar and a delay in shipping a new high-end disk drive . 275 | Output at Japanese-owned and managed plants in the U.S. is due to rise ## % . 276 | Budget director Darman said he wo n't give federal agencies much leeway in coping with Gramm-Rudman spending cuts , which took effect yesterday . 277 | Darman hopes to prod Congress to finish a deficit plan . 278 | The Ways and Means plan would create another possible obstacle to selling sick thrifts . 279 | The Supreme Court agreed to decide whether a federal court may dismantle a merger that has won regulatory approval but been ruled anticompetitive in a private suit . 280 | Merrill Lynch 's profit slid ## % in the third quarter . 281 | Bear Stearns posted a #.# % gain , while PaineWebber had a decline due to a year-ago gain . 282 | Blue Arrow of Britain plans to return to the name Manpower and take a big write-off . 283 | NCNB 's profit more than doubled . 284 | K mart agreed to acquire Pace Membership Warehouse for $ ### million , expanding its presence in the growing wholesale-store business . 285 | Markets -- 286 | Dow Jones industrials ####.## , up ##.## ; transportation ####.## , off ###.## ; utilities ###.## , up #.## . 287 | Commodities : Dow Jones futures index ###.## , off #.## ; spot index ###.## , up #.## . 288 | Dollar : ###.## yen , off #.## ; #.#### marks , off #.#### . 289 | Monday , October ## , #### 290 | The key U.S. and foreign annual interest rates below are a guide to general levels but do n't always represent actual transactions . 291 | The base rate on corporate loans at large U.S. money center commercial banks . 292 | Reserves traded among commercial banks for overnight use in amounts of $ # million or more . 293 | Source : Fulton Prebon -LRB-- U.S.A . -RRB-- Inc . 294 | The charge on loans to depository institutions by the New York Federal Reserve Bank . 295 | CALL MONEY : # #/# % to ## % . 296 | The charge on loans to brokers on stock exchange collateral . 297 | COMMERCIAL PAPER : High-grade unsecured notes sold through dealers by major corporations in multiples of $ #,### : #.## % ## days ; #.## % ## days ; #.## % ## days . 298 | Average of top rates paid by major New York banks on primary new issues of negotiable C.D.s , usually on amounts of $ # million and more . 299 | Typical rates in the secondary market : #.## % one month ; #.## % three months ; #.## % six months . 300 | BANKERS ACCEPTANCES : #.## % ## days ; #.## % ## days ; #.## % ## days ; #.## % ### days ; #.## % ### days ; #.## % ### days . 301 | Negotiable , bank-backed business credit instruments typically financing an import order . 302 | The average of interbank offered rates for dollar deposits in the London market based on quotations at five major banks . 303 | FOREIGN PRIME RATES : Canada ##.## % ; Germany #.## % ; Japan #.### % ; Switzerland #.## % ; Britain ## % . 304 | These rate indications are n't directly comparable ; lending practices vary widely by location . 305 | FEDERAL HOME LOAN MORTGAGE CORP . -LRB-- Freddie Mac -RRB-- : Posted yields on ##-year mortgage commitments for delivery within ## days. 306 | #.## % , standard conventional fixedrate mortgages ; #.### % , # % rate capped one-year adjustable rate mortgages . 307 | MERRILL LYNCH READY ASSETS TRUST : #.## % . 308 | Annualized average rate of return after expenses for the past ## days ; not a forecast of future returns . 309 | Under the agreement , Intel will invest $ # million to acquire a # % stake in Alliant , a maker of minisupercomputers for scientists and engineers . 310 | Alliant said it plans to use the microprocessor in future products . 311 | In #### the Cincinnati company earned $ #.# million , or ## cents a share , on revenue of $ ###.# million . 312 | Omnicare said the division operates under the trade name ACCO and supplies the medical and dental markets . 313 | Burmah Oil PLC , a British independent oil and specialty chemicals marketing concern , said SHV Holdings N.V. of the Netherlands has built up a #.# % stake in the company . 314 | James Alexander , a Burmah spokesman , said SHV had previously owned `` a little under # % '' of Burmah for about two years . 315 | The Dutch company had n't notified Burmah of its reason for increasing the stake , he said . 316 | Burmah , which owns the Castrol brand of lubricant oils , reported a ## % rise in net income to # ##.# million -LRB-- $ ##.# million -RRB-- in the first half . 317 | The group consists of Weslock Corp. and JPI Modern Inc . 318 | The company 's remaining business is the manufacture and sale of engine and transmissions products for industrial and transportation applications . 319 | Citing a $ #.# million provision for doubtful accounts , Dallas-based National Heritage Inc. posted a loss for its fourth quarter ended June ## . 320 | The company said the $ #.# million reserve was created to reflect doubt about the collectability of receivables owed to National Heritage by some of the real estate partnerships it manages . 321 | The company also said expenses incurred by the previous board and management in the recent contest for control were recognized primarily in the first quarter ended Sept. ## . 322 | `` The new structure will enable United Biscuits to focus clearly upon opportunities for planned growth during the ####s , '' said Bob Clarke , deputy chairman and group chief executive . 323 | Last month , United Biscuits agreed to sell its entire restaurant operations to Grand Metropolitan PLC for # ### million . 324 | An American journalist now is standing trial in Namibia . 325 | The most likely winner will be the Marxist-dominated SWAPO rebels . 326 | The U.S. journalist 's `` crime '' was writing that the head of the commission charged with overseeing the election 's fairness , Bryan O'Linn , was openly sympathetic to SWAPO . 327 | Shortly after that , Mr. O'Linn had Scott Stanley arrested and his passport confiscated . 328 | SWAPO has enjoyed favorable Western media treatment ever since the U.N. General Assembly declared it the `` sole , authentic representative '' of Namibia 's people in 329 | The elections are set for Nov. # . 330 | In July , Mr. Stanley , editor of American Press International , a Washington D.C.-based conservative wire service , visited Namibia to report on the U.N.-monitored election campaign . 331 | After Mr. Stanley 's article was published in two Namibian newspapers , Mr. O'Linn had criminal charges brought against their editors , publisher and lawyer . 332 | Mr. Stanley was arrested and charged along with the others when he returned to Namibia this month . 333 | Both the State Department and the Lawyers Committee for Freedom of the Press have protested Mr. Stanley 's detention . 334 | Both South African and SWAPO extremists are intimidating voters . 335 | It now has the chance to redress that record in Namibia . 336 | Commodity futures prices generally reflected the stability of the stock market following its plunge Friday . 337 | Yesterday , the stock market 's influence at first created nervousness . 338 | Trading in cotton and sugar was nervous and showed small declines . 339 | In Chicago , grain and soybean prices rose slightly . 340 | Livestock and meat prices , however , dropped on concern that a financial crisis would cut consumption of beef and pork . 341 | The spot October gold price rose $ # to $ ###.## an ounce . 342 | December silver was up #.# cents an ounce at $ #.### . 343 | Platinum behaved more like an industrial metal , easing early on concern over a possible weaker economy , but recovering later , as the stock market strengthened . 344 | For one thing , last Friday , precious metals markets closed before the stock market went into its late-in-the-day nose dive , so it could n't react to it . 345 | Back on Friday , Oct. ## , l### , the stock market declined during the day , and gold prices surged as the stock market fell . 346 | Yesterday 's October gain of $ # was miniscule compared with that . 347 | There 's a good chance that gold will retain its gains and rise further , he said . 348 | He expects a drop in interest rates , which would help gold by keeping the dollar from rising . 349 | Finally , according to Mr. Cardillo , the impact of the strong dollar should be reflected in reduced exports in the August merchandise trade deficit , when the figures are released today . 350 | This would be damaging to the dollar and supportive for gold , he said . 351 | ENERGY : 352 | The U.S. benchmark crude , West Texas Intermediate , closed at $ ##.## a barrel for November delivery , down ## cents . 353 | But most market observers agreed that Friday 's stock market drop is what dampened spirits in the petroleum pits yesterday . 354 | Until yesterday , futures prices had been headed up on expectations that world oil demand will continue to be strong . 355 | The Organization of Petroleum Exporting Countries increased its production ceiling for the fourth quarter based on projections of robust demand . 356 | Indeed , after reacting early in the trading day to Friday 's plummet , futures prices firmed up again as traders took note of the stock market 's partial recovery yesterday . 357 | COPPER : 358 | Futures prices fell and showed little rebound as one major labor problem that had been underpinning prices appeared to be solved . 359 | Prices were down from the outset of trading on concern that a drop in the stock market might create a weakened economy and a consequent reduction in copper use . 360 | Highland Valley is a large Canadian producer and principal supplier to Japan , which recently began seeking copper elsewhere as its inventories shrank . 361 | Last week it was reported that company and union negotiations had overcome the major hurdle , the contracting out of work by the company . 362 | Now , the analyst said , only minor points remain to be cleaned up . 363 | `` For all intents and purposes , an agreement appears to have been achieved , '' he said . 364 | Copper inventories in New York 's Commodity Exchange warehouses rose yesterday by ### tons , to ##,### tons . 365 | London Metal Exchange copper inventories last week declined ##,### tons , to ##,### tons . 366 | The LME stocks decline was about as expected , but the Comex gain was n't . 367 | However , this was brushed aside by concern over the stock market , the analyst said . 368 | `` It was simply overbought , '' he said , and selling by funds that are computer guided helped depress prices . 369 | Futures prices eased , more in reaction to Hurricane Jerry than to any influence of the stock market . 370 | The December contract ended with a loss of #.## cent a pound , at ##.## cents . 371 | Prices rose sharply Friday as the storm approached Texas and Louisiana , which is part of the Mississippi Delta cotton-growing area . 372 | However , after absorbing the potential effect of the hurricane , prices began to slip late Friday , Mr. Simon said . 373 | That selling continued yesterday and kept prices under pressure , he said . 374 | Colder weather is being predicted for the high plains of Texas and the northern states of the Delta during the coming weekend , Mr. Simon said . 375 | That has n't yet captured traders ' attention , he added . 376 | Futures prices declined . 377 | The March contract was off #.## cent a pound at ##.## cents . 378 | At one point in early trading the March price rose to as high as ##.## cents when the stock market recovered , but the price then fell back . 379 | India recently bought ###,### tons and was expected to buy more , the analyst said . 380 | Another analyst thought that India may have pulled back because of the concern over the stock market . 381 | At any rate , she added , `` India needs the sugar , so it will be in sooner or later to buy it . '' 382 | FARM PRODUCTS : 383 | The price of the hog contract for October delivery dropped its maximum permissible daily limit of #.# cents a pound . 384 | The prices of most grain futures contracts rose slightly yesterday out of relief that the stock market was showing signs of recovering . 385 | Earlier in the session , the prices of several soybean contracts set new lows . 386 | A broad rally began when several major processors began buying futures contracts , apparently to take advantage of the price dip . 387 | The Knight-Ridder spokesman said the third-quarter earnings that the company plans to report Oct. ## are expected to be up . 388 | Dunkin ' Donuts is based in Randolph , Mass . 389 | Cara Operations , a food services concern , and Unicorp , a holding company with interests in oil and natural gas and financial services , are based in Toronto . 390 | The company earned $ ## million , or ## cents a share , in the year-ago quarter . 391 | Earlier repairs vented the CFCs out of the home through a hose directly into the atmosphere . 392 | CFCs are widely used as solvents , coolants and fire suppressants . 393 | However , Maxus said it believes the reserves in the field are about ## million barrels of oil . 394 | Maxus , an independent oil and gas concern , is the operator and owns a ## % interest in the new field , called Northeast Intan . 395 | The production-sharing contract area is held with Pertamina , the Indonesian state oil company . 396 | But because of confusion , it took those credits again in reporting its results through the first nine months . 397 | In Montreal Exchange trading , Memotec closed unchanged at ##.### Canadian dollars -LRB-- US$ #.## -RRB-- . 398 | Memotec is a Montreal-based maker of telecommunications products and provider of telecommunications and computer services . 399 | Memotec said the agreement calls for it to make a $ ##-a-share cash tender offer for all shares outstanding of ISI . 400 | Memotec said the tender offer is conditioned on , among other things , holders tendering at least ## % of the shares outstanding , other than the shares held by Mr. Johnston . 401 | -------------------------------------------------------------------------------- /gtnlplib/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rguthrie3/DeepDependencyParsingProblemSet/4da796d631cb2f78725741be5c3c2a2a308b02bb/gtnlplib/__init__.py -------------------------------------------------------------------------------- /gtnlplib/constants.py: -------------------------------------------------------------------------------- 1 | # Data files 2 | TRAIN_FILE = "data/train.txt" 3 | DEV_FILE = "data/dev.txt" 4 | TEST_FILE = "data/test-hidden.txt" 5 | PRETRAINED_EMBEDS_FILE = "data/pretrained-embeds.pkl" 6 | 7 | # Prediction output files 8 | D3_2_DEV_FILENAME = "3_2-dev.preds" 9 | D3_2_TEST_FILENAME = "3_2-test.preds" 10 | D4_4_DEV_FILENAME = "4_4-dev.preds" 11 | D4_4_TEST_FILENAME = "4_4-test.preds" 12 | 13 | # Keys 14 | DEV_GOLD = "data/dev.key" 15 | 16 | class Actions: 17 | """Simple Enum for each possible parser action""" 18 | SHIFT = 0 19 | REDUCE_L = 1 20 | REDUCE_R = 2 21 | 22 | NUM_ACTIONS = 3 23 | 24 | action_to_ix = { "SHIFT": SHIFT, 25 | "REDUCE_L": REDUCE_L, 26 | "REDUCE_R": REDUCE_R } 27 | 28 | END_OF_INPUT_TOK = "" 29 | NULL_STACK_TOK = "" 30 | ROOT_TOK = "" 31 | 32 | HAVE_CUDA = False 33 | -------------------------------------------------------------------------------- /gtnlplib/data_tools.py: -------------------------------------------------------------------------------- 1 | from collections import namedtuple 2 | from gtnlplib.constants import END_OF_INPUT_TOK, NULL_STACK_TOK 3 | 4 | Instance = namedtuple("Instance", ["sentence", "action_sequence"]) 5 | 6 | 7 | def parse_file(filename): 8 | with open(filename, "r") as f: 9 | instances = [] 10 | vocab = set() 11 | for inst in f: 12 | sentence, actions = inst.split(" ||| ") 13 | 14 | # Make sure there is no leading/trailing whitespace 15 | sentence = sentence.strip().split() 16 | actions = actions.strip().split() 17 | 18 | for word in sentence: 19 | vocab.add(word) 20 | 21 | instances.append(Instance(sentence, actions)) 22 | return instances, vocab 23 | 24 | 25 | def read_test_file(filename): 26 | with open(filename, "r") as f: 27 | sentences = [] 28 | vocab = set() 29 | for sent in f: 30 | sentence = sent.strip().split() 31 | for word in sentence: 32 | vocab.add(word) 33 | 34 | sentences.append(sentence) 35 | return sentences, vocab 36 | 37 | 38 | class Dataset: 39 | """ 40 | Class for holding onto the train, dev and test data 41 | and returning iterators over it. 42 | Also stores the full data's vocab in a set 43 | """ 44 | 45 | 46 | def __init__(self, train_filename, dev_filename, test_filename): 47 | self.vocab = set() 48 | if train_filename is not None: 49 | self.training_data, tmp = parse_file(train_filename) 50 | self.vocab.update(tmp) 51 | if dev_filename is not None: 52 | self.dev_data, tmp = parse_file(dev_filename) 53 | self.vocab.update(tmp) 54 | if test_filename is not None: 55 | self.test_data, tmp = read_test_file(test_filename) 56 | self.vocab.update(tmp) 57 | 58 | self.vocab.add(END_OF_INPUT_TOK) 59 | self.vocab.add(NULL_STACK_TOK) 60 | 61 | @property 62 | def training_data(self): 63 | return self.training_data 64 | 65 | @property 66 | def dev_data(self): 67 | return self.dev_data 68 | 69 | @property 70 | def test_data(self): 71 | return self.test_data 72 | 73 | @property 74 | def vocab(self): 75 | return self.vocab 76 | 77 | 78 | def make_file_key(output_filename, sentences, gold_parses): 79 | with open(output_filename, "w") as outfile: 80 | outfile.write("MOD-IDX\tMOD\tHEAD\tHEAD-IDX\n") 81 | for sentence, parse in zip(sentences, gold_parses): 82 | for i, word in enumerate(sentence): 83 | for edge in parse: 84 | if edge.modifier[1] == i: 85 | outfile.write("{}\t{}\t{}\t{}\n".format(i, word, edge.head[0], edge.head[1])) 86 | outfile.write("\n") 87 | 88 | 89 | def make_kaggle_key(output_filename, sentences, gold_parses): 90 | wc = 1 91 | with open(output_filename, "w") as outfile: 92 | outfile.write("Id,Prediction\n") 93 | for sentence, parse in zip(sentences, gold_parses): 94 | for i, word in enumerate(sentence): 95 | for edge in parse: 96 | if edge.modifier[1] == i: 97 | outfile.write("{},{}\n".format(wc, edge.head[1])) 98 | wc += 1 99 | -------------------------------------------------------------------------------- /gtnlplib/evaluation.py: -------------------------------------------------------------------------------- 1 | from gtnlplib.parsing import ParserState, DepGraphEdge 2 | from gtnlplib.utils import DummyCombiner 3 | from constants import Actions, ROOT_TOK 4 | 5 | def dependency_graph_from_oracle(sentence, actions): 6 | """ 7 | Take a sentence and a sequence of shift-reduce actions and a sentence 8 | and return a set of dependency edges for further evaluation 9 | """ 10 | stack = ParserState(sentence, [None]*len(sentence), DummyCombiner()) 11 | dependency_graph = set() 12 | 13 | for act in [ Actions.action_to_ix[a] for a in actions ]: 14 | if act == Actions.SHIFT: 15 | stack.shift() 16 | elif act == Actions.REDUCE_L: 17 | dependency_graph.add(stack.reduce_left()) 18 | elif act == Actions.REDUCE_R: 19 | dependency_graph.add(stack.reduce_right()) 20 | 21 | dependency_graph.add(DepGraphEdge((ROOT_TOK, -1), (stack.stack[-1].headword, stack.stack[-1].headword_pos))) 22 | return dependency_graph 23 | 24 | 25 | def exact_match(predicted, gold): 26 | return predicted == gold 27 | 28 | 29 | def fscore(predicted, gold): 30 | TP = float(len(predicted.intersection(gold))) 31 | FN = float(len(gold.difference(predicted))) 32 | FP = float(len(predicted.difference(gold))) 33 | return 2 * TP / (2 * TP + FN + FP) 34 | 35 | 36 | def attachment(predicted, gold): 37 | # this is kinda inefficeint but it doesnt really matter 38 | # not worth the trouble 39 | correct = 0 40 | total = 0 41 | for gold_edge in gold: 42 | for predicted_edge in predicted: 43 | if predicted_edge.modifier == gold_edge.modifier: 44 | if predicted_edge.head == gold_edge.head: 45 | correct += 1 46 | total += 1 47 | break 48 | return correct, total 49 | 50 | 51 | def compute_attachment(parser, data): 52 | correct = 0 53 | total = 0 54 | for sentence, actions in data: 55 | if len(sentence) <= 1: continue 56 | gold = dependency_graph_from_oracle(sentence, actions) 57 | predicted = parser.predict(sentence) 58 | correct_t, total_t = attachment(predicted, gold) 59 | correct += correct_t 60 | total += total_t 61 | return float(correct) / total 62 | 63 | 64 | def compute_metric(parser, data, metric): 65 | val = 0. 66 | for sentence, actions in data: 67 | if len(sentence) > 1: 68 | gold = dependency_graph_from_oracle(sentence, actions) 69 | predicted = parser.predict(sentence) 70 | val += metric(predicted, gold) 71 | return val / len(data) 72 | 73 | 74 | def output_preds(filename, parser, data): 75 | with open(filename, "w") as outfile: 76 | for sentence in data: 77 | pred_graph = parser.predict(sentence) 78 | for i, word in enumerate(sentence): 79 | for edge in pred_graph: 80 | if edge.modifier[1] == i: 81 | outfile.write("{}\t{}\t{}\t{}\n".format(i, word, edge.head[0], edge.head[1])) 82 | outfile.write("\n") 83 | 84 | 85 | def kaggle_output(filename, parser, data): 86 | wc = 1 87 | with open(filename, "w") as outfile: 88 | outfile.write("Id,Prediction\n") 89 | for sentence in data: 90 | pred_graph = parser.predict(sentence) 91 | for i, word in enumerate(sentence): 92 | for edge in pred_graph: 93 | if edge.modifier[1] == i: 94 | outfile.write("{},{}\n".format(wc, edge.head[1])) 95 | wc += 1 96 | -------------------------------------------------------------------------------- /gtnlplib/feat_extractors.py: -------------------------------------------------------------------------------- 1 | class SimpleFeatureExtractor: 2 | 3 | def get_features(self, parser_state, **kwargs): 4 | """ 5 | Take in all the information and return features. 6 | Your features should be autograd.Variable objects 7 | of embeddings. 8 | 9 | the **kwargs is special python syntax. You can pass additional keyword arguments, 10 | and they will show up in kwargs in your function body as a dict. For example: 11 | ** function call ** 12 | get_features(my_parser_state, action_history=[SHIFT, SHIFT]) 13 | ** in function body ** 14 | kwargs["action_history"] == [SHIFT, SHIFT] # true 15 | 16 | check the python documentation if you are confused. 17 | YOU DONT NEED TO USE KWARGS IN THIS FUNCTION!!! 18 | It is there only so that if you make a more complicated feature extractor for your Kaggle submission, 19 | you can pass in arbitrary information about the parse without this feature extractor complaining 20 | 21 | :param parser_state the ParserState object for the current parse (giving access 22 | to the stack and input buffer) 23 | :return A list of autograd.Variable objects, which are the embeddings of your 24 | features 25 | """ 26 | # STUDENT 27 | pass 28 | # END STUDENT 29 | -------------------------------------------------------------------------------- /gtnlplib/neural_net.py: -------------------------------------------------------------------------------- 1 | import utils 2 | import torch 3 | import torch.nn as nn 4 | import torch.autograd as ag 5 | import torch.nn.functional as F 6 | 7 | from gtnlplib.constants import Actions, HAVE_CUDA 8 | 9 | if HAVE_CUDA: 10 | import torch.cuda as cuda 11 | 12 | # ===-----------------------------------------------------------------------------=== 13 | # INITIAL EMBEDDING COMPONENTS 14 | # ===-----------------------------------------------------------------------------=== 15 | # These components are responsible for initializing the input buffer with embeddings. 16 | # An embedding must be supplied for each word in the sentence. 17 | # 18 | # This class of components has the interface 19 | # inputs: the input sentence as a list of strings 20 | # outputs: a list of autograd Variables, where the ith element of the list is the 21 | # embedding for the ith word. 22 | # 23 | # The output of forward() for these components is what is used to initialize the 24 | # input buffer, and what will be shifted onto the stack, and used in combination 25 | # when doing reductions. 26 | 27 | 28 | class VanillaWordEmbeddingLookup(nn.Module): 29 | """ 30 | A component that simply returns a list of the word embeddings as 31 | autograd Variables. 32 | """ 33 | 34 | def __init__(self, word_to_ix, embedding_dim): 35 | """ 36 | Construct an embedding lookup table for use in the forward() 37 | function 38 | :param word_to_ix Dict mapping words to unique indices 39 | :param embedding_dim The dimensionality of the embeddings 40 | """ 41 | super(VanillaWordEmbeddingLookup, self).__init__() 42 | self.word_to_ix = word_to_ix 43 | self.embedding_dim = embedding_dim 44 | self.use_cuda = False 45 | 46 | # The transition parser wants to know the size of the embeddings 47 | # it is getting. Don't worry about this 48 | self.output_dim = embedding_dim 49 | 50 | # STUDENT 51 | # name your embedding member "word_embeddings" 52 | # END STUDENT 53 | 54 | 55 | def forward(self, sentence): 56 | """ 57 | :param sentence A list of strings, the text of the sentence 58 | :return A list of autograd.Variables, where list[i] is the 59 | embedding of word i in the sentence. 60 | NOTE: the Variables returned should be row vectors, that 61 | is, of shape (1, embedding_dim) 62 | """ 63 | inp = utils.sequence_to_variable(sentence, self.word_to_ix, self.use_cuda) 64 | embeds = [] # store each Variable in here 65 | # STUDENT 66 | # END STUDENT 67 | return embeds 68 | 69 | 70 | class BiLSTMWordEmbeddingLookup(nn.Module): 71 | """ 72 | In this component, you will use a Bi-Directional 73 | LSTM to get the initial embeddings. The embedding 74 | for word i to initailize the input buffer is the ith hidden state of the LSTM 75 | after passing the sentence through the LSTM. 76 | """ 77 | 78 | def __init__(self, word_to_ix, word_embedding_dim, hidden_dim, num_layers, dropout): 79 | """ 80 | :param word_to_ix Dict mapping words to unique indices 81 | :param word_embedding_dim The dimensionality of the input word embeddings 82 | :param hidden_dim The dimensionality of the output embeddings that go 83 | on the stack 84 | :param num_layers The number of LSTM layers to have 85 | :param dropout Amount of dropout to have in LSTM 86 | """ 87 | super(BiLSTMWordEmbeddingLookup, self).__init__() 88 | self.word_to_ix = word_to_ix 89 | self.num_layers = num_layers 90 | self.word_embedding_dim = word_embedding_dim 91 | self.hidden_dim = hidden_dim 92 | self.use_cuda = False 93 | 94 | self.output_dim = hidden_dim 95 | 96 | # STUDENT 97 | # Construct the needed components in this order: 98 | # 1. An embedding lookup table 99 | # 2. The LSTM 100 | # Note we want the output dim to be hidden_dim, but since our LSTM 101 | # is bidirectional, we need to make the output of each direction hidden_dim/2 102 | # name your embedding member "word_embeddings" 103 | # END STUDENT 104 | 105 | self.hidden = self.init_hidden() 106 | 107 | def forward(self, sentence): 108 | """ 109 | This function has two parts 110 | 1. Look up the embeddings for the words in the sentence. 111 | These will be the inputs to the LSTM sequence model. 112 | NOTE: At this step, rather than be a list of embeddings, 113 | it should be a tensor of shape (len(sentence_idxs), 1, embedding_dim) 114 | The 1 is for the mini-batch size. Don't get confused by it, 115 | just make it that shape. 116 | 2. Now that you have your tensor of embeddings of shape (len(sentence_idxs, 1, embedding_dim)), 117 | You can pass it through your LSTM. 118 | Refer to the Pytorch documentation to see what the outputs are 119 | 3. Convert the outputs into the correct return type, which is a list of 120 | embeddings of shape (1, embedding_dim) 121 | NOTE: Make sure you are reassigning self.hidden_state to the new hidden state!!! 122 | :param sentence A list of strs, the words of the sentence 123 | """ 124 | assert self.word_to_ix is not None, "ERROR: Make sure to set word_to_ix on \ 125 | the embedding lookup components" 126 | inp = utils.sequence_to_variable(sentence, self.word_to_ix, self.use_cuda) 127 | 128 | # STUDENT 129 | # END STUDENT 130 | 131 | def init_hidden(self): 132 | """ 133 | PyTorch wants you to supply the last hidden state at each timestep 134 | to the LSTM. You shouldn't need to call this function explicitly 135 | """ 136 | if self.use_cuda: 137 | return (ag.Variable(cuda.FloatTensor(self.num_layers * 2, 1, self.hidden_dim/2).zero_()), 138 | ag.Variable(cuda.FloatTensor(self.num_layers * 2, 1, self.hidden_dim/2).zero_())) 139 | else: 140 | return (ag.Variable(torch.zeros(self.num_layers * 2, 1, self.hidden_dim/2)), 141 | ag.Variable(torch.zeros(self.num_layers * 2, 1, self.hidden_dim/2))) 142 | 143 | def clear_hidden_state(self): 144 | self.hidden = self.init_hidden() 145 | 146 | 147 | # ===-----------------------------------------------------------------------------=== 148 | # COMBINER NETWORK COMPONENTS 149 | # ===-----------------------------------------------------------------------------=== 150 | # These components have interface 151 | # inputs: head_embed, modifier_embed from the stack during a reduction 152 | # outputs: A new embedding to place back on the stack, representing the combination 153 | # of head and modifier 154 | 155 | class MLPCombinerNetwork(nn.Module): 156 | """ 157 | This network piece takes the top two elements of the stack's embeddings 158 | and combines them to create a new embedding after a reduction. 159 | 160 | Ex.: 161 | 162 | Stack: 163 | | away | | Combine(away, ran) | 164 | |------| |--------------------| 165 | | ran | | man | 166 | |------| REDUCE_L |--------------------| 167 | | man | --------> | The | 168 | |------| |--------------------| 169 | | The | 170 | |------| 171 | 172 | Note that this means that this network gives a *dense output* 173 | 174 | The network architecture is: 175 | Inputs: 2 word embeddings (the head and the modifier embeddings) 176 | Output: Run through an affine map + tanh + affine 177 | """ 178 | 179 | def __init__(self, embedding_dim): 180 | """ 181 | Construct the linear components you will need in forward() 182 | NOTE: Think carefully about what the input and output 183 | dimensions of your linear layers should be 184 | :param embedding_dim The dimensionality of the embeddings 185 | """ 186 | super(MLPCombinerNetwork, self).__init__() 187 | 188 | # STUDENT 189 | # Construct the components in this order 190 | # 1. The first linear layer 191 | # 2. The second linear layer 192 | # The output of the first linear layer should be embedding_dim 193 | # (the rest of the input/output dims are thus totally determined) 194 | # END STUDENT 195 | 196 | def forward(self, head_embed, modifier_embed): 197 | """ 198 | HINT: use utils.concat_and_flatten() to combine head_embed and modifier_embed 199 | into a single tensor. 200 | 201 | :param head_embed The embedding of the head in the reduction 202 | :param modifier_embed The embedding of the modifier in the reduction 203 | :return The embedding of the combination as a row vector 204 | """ 205 | # STUDENT 206 | pass 207 | # END STUDENT 208 | 209 | 210 | class LSTMCombinerNetwork(nn.Module): 211 | """ 212 | A combiner network that does a sequence model over states, rather 213 | than just some simple encoder like above. 214 | 215 | Input: 2 embeddings, the head embedding and modifier embedding 216 | Output: Concatenate the 2 embeddings together and do one timestep 217 | of the LSTM, returning the hidden state, which will be placed 218 | on the stack. 219 | """ 220 | 221 | def __init__(self, embedding_dim, num_layers, dropout): 222 | """ 223 | Construct your LSTM component for use in forward(). 224 | Think about what size the input and output of your LSTM 225 | should be 226 | 227 | :param embedding_dim Dimensionality of stack embeddings 228 | :param num_layers How many LSTM layers to use 229 | :param dropout The amount of dropout to use in LSTM 230 | """ 231 | super(LSTMCombinerNetwork, self).__init__() 232 | self.embedding_dim = embedding_dim 233 | self.num_layers = num_layers 234 | self.use_cuda = False 235 | 236 | # STUDENT 237 | # END STUDENT 238 | 239 | self.hidden = self.init_hidden() 240 | 241 | 242 | def init_hidden(self): 243 | """ 244 | PyTorch wants you to supply the last hidden state at each timestep 245 | to the LSTM. You shouldn't need to call this function explicitly 246 | """ 247 | if self.use_cuda: 248 | return (ag.Variable(cuda.FloatTensor(self.num_layers, 1, self.embedding_dim).zero_()), 249 | ag.Variable(cuda.FloatTensor(self.num_layers, 1, self.embedding_dim).zero_())) 250 | else: 251 | return (ag.Variable(torch.FloatTensor(self.num_layers, 1, self.embedding_dim).zero_()), 252 | ag.Variable(torch.FloatTensor(self.num_layers, 1, self.embedding_dim).zero_())) 253 | 254 | 255 | def forward(self, head_embed, modifier_embed): 256 | """ 257 | Do the next LSTM step, and return the hidden state as the new 258 | embedding for the reduction 259 | 260 | Here, note that PyTorch's LSTM wants the input to be a tensor with axis semantics 261 | (seq_len, batch_size, input_dimensionality), but we are not minibatching (so batch_size=1) 262 | and seq_len=1 since we are only doing 1 timestep 263 | 264 | NOTE: use utils.concat_and_flatten() like in the MLP Combiner 265 | NOTE: Make sure the tensor you hand to your LSTM is the size it wants: 266 | (seq_len, batch_size, embedding_dim), which in this case, is (1, 1, embedding_dim) 267 | NOTE: If you add more layers to the LSTM (more than 1), your code may break. 268 | To fix it, look at the value of self.hidden whenever you have more layers. 269 | 270 | :param head_embed Embedding of the head word 271 | :param modifier_embed Embedding of the modifier 272 | """ 273 | # STUDENT 274 | pass 275 | # END STUDENT 276 | 277 | 278 | def clear_hidden_state(self): 279 | self.hidden = self.init_hidden() 280 | 281 | 282 | # ===-----------------------------------------------------------------------------=== 283 | # ACTION CHOOSING COMPONENTS 284 | # ===-----------------------------------------------------------------------------=== 285 | class ActionChooserNetwork(nn.Module): 286 | """ 287 | This network piece takes a bunch of features from the current 288 | state of the parser and runs them through an MLP, 289 | returning log probabilities over actions 290 | 291 | The network should be 292 | inputs -> affine layer -> relu -> affine layer -> log softmax 293 | """ 294 | 295 | def __init__(self, input_dim): 296 | """ 297 | Construct the linear components that you need in forward() here. 298 | Think carefully about the input and output dimensionality of your linear layers 299 | HINT: What should be the dimensionality of your log softmax at the end? 300 | 301 | :param input_dim The dimensionality of your input: that is, when all your 302 | feature embeddings are concatenated together 303 | """ 304 | super(ActionChooserNetwork, self).__init__() 305 | # STUDENT 306 | # Construct in this order: 307 | # 1. The first linear layer (the one that is called first in the network) 308 | # 2. The second linear layer 309 | # END STUDENT 310 | 311 | def forward(self, inputs): 312 | """ 313 | NOTE: Use utils.concat_and_flatten to combine all the features into one big 314 | row vector. 315 | 316 | :param inputs A list of autograd.Variables, which are all of the features we will use 317 | :return a Variable which is the log probabilities of the actions, of shape (1, 3) 318 | (it is a row vector, with an entry for each action) 319 | """ 320 | # STUDENT 321 | pass 322 | # END STUDENT 323 | -------------------------------------------------------------------------------- /gtnlplib/parsing.py: -------------------------------------------------------------------------------- 1 | from collections import deque, namedtuple 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.autograd as ag 6 | 7 | from gtnlplib.constants import Actions, NULL_STACK_TOK, END_OF_INPUT_TOK, HAVE_CUDA, ROOT_TOK 8 | import gtnlplib.utils as utils 9 | import gtnlplib.neural_net as neural_net 10 | 11 | if HAVE_CUDA: 12 | import torch.cuda as cuda 13 | 14 | 15 | # Named tuples are basically just like C structs, where instead of accessing 16 | # indices where the indices have no semantics, you can access the tuple with a name. 17 | # check python docs 18 | DepGraphEdge = namedtuple("DepGraphEdge", ["head", "modifier"]) 19 | 20 | # These are what initialize the input buffer, and are used on the stack. 21 | # headword: The head word, stored as a string 22 | # headword_pos: The position of the headword in the sentence as an int 23 | # embedding: The embedding of the phrase as an autograd.Variable 24 | StackEntry = namedtuple("StackEntry", ["headword", "headword_pos", "embedding"]) 25 | 26 | 27 | class ParserState: 28 | """ 29 | Manages the state of a parse by keeping track of 30 | the input buffer and stack, and offering a public interface for 31 | doing actions (i.e, SHIFT, REDUCE_L, REDUCE_R) 32 | """ 33 | 34 | def __init__(self, sentence, sentence_embs, combiner, null_stack_tok_embed=None): 35 | """ 36 | :param sentence A list of strings, the words in the sentence 37 | :param sentence_embs A list of ag.Variable objects, where the ith element is the embedding 38 | of the ith word in the sentence 39 | :param combiner A network component that gives an output embedding given two input embeddings 40 | when doing a reduction 41 | :param null_stack_tok_embed ag.Variable The embedding of NULL_STACK_TOK 42 | """ 43 | self.combiner = combiner 44 | 45 | # Input buffer is a list, along with an index into the list. 46 | # curr_input_buff_idx points to the *next element to pop off the input buffer* 47 | self.curr_input_buff_idx = 0 48 | self.input_buffer = [ StackEntry(we[0], pos, we[1]) for pos, we in enumerate(zip(sentence, sentence_embs)) ] 49 | 50 | self.stack = [] 51 | self.null_stack_tok_embed = null_stack_tok_embed 52 | 53 | def shift(self): 54 | next_item = self.input_buffer[self.curr_input_buff_idx] 55 | self.stack.append(next_item) 56 | self.curr_input_buff_idx += 1 57 | 58 | def reduce_left(self): 59 | return self._reduce(Actions.REDUCE_L) 60 | 61 | def reduce_right(self): 62 | return self._reduce(Actions.REDUCE_R) 63 | 64 | def done_parsing(self): 65 | """ 66 | Returns True if we are done parsing. 67 | Otherwise, returns False 68 | Remember that we are padding the input with an token. 69 | self.curr_input_buff_idx is the just an index to the next token in the input buffer 70 | (i.e, each SHIFT increments this index by 1). 71 | should not be shifted onto the stack ever. 72 | """ 73 | # STUDENT 74 | pass 75 | # END STUDENT 76 | 77 | def stack_len(self): 78 | return len(self.stack) 79 | 80 | def input_buffer_len(self): 81 | return len(self.input_buffer) - self.curr_input_buff_idx 82 | 83 | def stack_peek_n(self, n): 84 | """ 85 | Look at the top n items on the stack. 86 | If you ask for more than are on the stack, copies of the null_stack_tok_embed 87 | are returned 88 | :param n How many items to look at 89 | """ 90 | if len(self.stack) - n < 0: 91 | return [ StackEntry(NULL_STACK_TOK, -1, self.null_stack_tok_embed) ] * (n - len(self.stack)) \ 92 | + self.stack[:] 93 | return self.stack[-n:] 94 | 95 | def input_buffer_peek_n(self, n): 96 | """ 97 | Look at the next n words in the input buffer 98 | :param n How many words ahead to look 99 | """ 100 | assert self.curr_input_buff_idx + n - 1 <= len(self.input_buffer) 101 | return self.input_buffer[self.curr_input_buff_idx:self.curr_input_buff_idx+n] 102 | 103 | def _reduce(self, action): 104 | """ 105 | Reduce the top two items on the stack, combine them, 106 | and place the combination back on the stack. 107 | Combination means running the embeddings of the two stack items 108 | through your combiner network and getting a dense output. 109 | (The ParserState stores the combiner in the instance variable self.combiner). 110 | 111 | Important things to note: 112 | - Make sure you get which word is the head and which word 113 | is the modifer correct 114 | - Make sure that the order of the embeddings you pass into 115 | the combiner is correct. The head word should always go first, 116 | and the modifier second. Mixing it up will make it more difficult 117 | to learn the relationships it needs to learn 118 | - Make sure when creating the new dependency graph edge, that you store it in the DepGraphEdge object 119 | like this ( (head word, position of head word), (modifier word, position of modifier) ). 120 | Keeping track of the positions in the sentence is necessary to be able to uniquely 121 | identify edges when a sentence contains the same word multiple times. 122 | Technically the position is all that is needed, but it will be easier to debug 123 | if you carry along the word too. 124 | 125 | :param action Whether we reduce left or reduce right 126 | :return DepGraphEdge The edge that was formed in the dependency graph. 127 | """ 128 | assert len(self.stack) >= 2, "ERROR: Cannot reduce with stack length less than 2" 129 | 130 | # STUDENT 131 | # hint: use list.pop() 132 | # END STUDENT 133 | 134 | def __str__(self): 135 | """ 136 | Print the state for debugging 137 | """ 138 | # only print the words, dont want to print the embeddings too 139 | return "Stack: {}\nInput Buffer: {}\n".format([ entry.headword for entry in self.stack ], 140 | [ entry.headword for entry in self.input_buffer[self.curr_input_buff_idx:] ]) 141 | 142 | 143 | 144 | class TransitionParser(nn.Module): 145 | 146 | def __init__(self, feature_extractor, word_embedding_component, action_chooser_component, combiner_component): 147 | """ 148 | :param feature_extractor A FeatureExtractor object to get features 149 | from the parse state for making decisions 150 | :param word_embedding_component Network component to get embeddings for each word in the sentence 151 | TODO implement this 152 | :param action_chooser_component Network component that gives probabilities over actions (makes decisions) 153 | :param combiner_component Network component to combine embeddings during reductions 154 | """ 155 | super(TransitionParser, self).__init__() 156 | 157 | self.word_embedding_component = word_embedding_component 158 | self.feature_extractor = feature_extractor 159 | self.combiner = combiner_component 160 | self.action_chooser = action_chooser_component 161 | self.use_cuda = False 162 | 163 | # This embedding is what is returned to indicate that part of the stack is empty 164 | self.null_stack_tok_embed = nn.Parameter(torch.randn(1, word_embedding_component.output_dim)) 165 | 166 | 167 | def forward(self, sentence, actions=None): 168 | """ 169 | Does the core parsing logic. 170 | If you are supplied actions, you should do those. 171 | Make sure to return everything that needs to be returned 172 | * The log probabilities from every choice made 173 | * The dependency graph 174 | * The actions you did as a list 175 | 176 | The boiler plate at the beginning is just gross padding and stuff 177 | You can basically ignore it. Just know that it initializes a valid 178 | ParserState object, and now you may do actions on that state by calling 179 | shift(), reduce_right(), reduce_left(), or get features from it in your 180 | feature extractor. 181 | 182 | Make sure that if you are not supplied actions, and you do actions your 183 | network predicts, that you only do valid actions. 184 | 185 | Also, note that symbolic constants have been defined for the different Actions in constants.py 186 | E.g Actions.SHIFT is 0, Actions.REDUCE_L is 1, so that the 0th element of 187 | the output of your action chooser is the log probability of shift, the 1st is the log probability 188 | of REDUCE_L, etc. 189 | """ 190 | self.refresh() # clear up hidden states from last run, if need be 191 | 192 | padded_sent = sentence + [END_OF_INPUT_TOK] 193 | 194 | # Initialize the parser state 195 | sentence_embs = self.word_embedding_component(padded_sent) 196 | 197 | parser_state = ParserState(padded_sent, sentence_embs, self.combiner, null_stack_tok_embed=self.null_stack_tok_embed) 198 | outputs = [] # Holds the output of each action decision 199 | actions_done = [] # Holds all actions we have done 200 | 201 | dep_graph = set() # Build this up as you go 202 | 203 | # Make the action queue if we have it 204 | if actions is not None: 205 | action_queue = deque() 206 | action_queue.extend([ Actions.action_to_ix[a] for a in actions ]) 207 | have_gold_actions = True 208 | else: 209 | have_gold_actions = False 210 | 211 | # STUDENT 212 | # END STUDENT 213 | 214 | dep_graph.add(DepGraphEdge((ROOT_TOK, -1), (parser_state.stack[-1].headword, parser_state.stack[-1].headword_pos))) 215 | return outputs, dep_graph, actions_done 216 | 217 | 218 | def refresh(self): 219 | if isinstance(self.combiner, neural_net.LSTMCombinerNetwork): 220 | self.combiner.clear_hidden_state() 221 | if isinstance(self.word_embedding_component, neural_net.BiLSTMWordEmbeddingLookup): 222 | self.word_embedding_component.clear_hidden_state() 223 | 224 | 225 | def predict(self, sentence): 226 | _, dep_graph, _ = self.forward(sentence) 227 | return dep_graph 228 | 229 | 230 | def predict_actions(self, sentence): 231 | _, _, actions_done = self.forward(sentence) 232 | return actions_done 233 | 234 | 235 | def to_cuda(self): 236 | self.use_cuda = True 237 | self.word_embedding_component.use_cuda = True 238 | self.combiner.use_cuda = True 239 | self.cuda() 240 | 241 | 242 | def to_cpu(self): 243 | self.use_cuda = False 244 | self.word_embedding_component.use_cuda = False 245 | self.combiner.use_cuda = False 246 | self.cpu() 247 | 248 | 249 | def train(data, model, optimizer, verbose=True): 250 | criterion = nn.NLLLoss() 251 | 252 | if model.use_cuda: 253 | criterion.cuda() 254 | 255 | correct_actions = 0 256 | total_actions = 0 257 | tot_loss = 0. 258 | instance_count = 0 259 | 260 | for sentence, actions in data: 261 | 262 | if len(sentence) <= 2: 263 | continue 264 | 265 | optimizer.zero_grad() 266 | model.refresh() 267 | 268 | outputs, _, actions_done = model(sentence, actions) 269 | 270 | if model.use_cuda: 271 | loss = ag.Variable(cuda.FloatTensor([0])) 272 | action_idxs = [ ag.Variable(cuda.LongTensor([ a ])) for a in actions_done ] 273 | else: 274 | loss = ag.Variable(torch.FloatTensor([0])) 275 | action_idxs = [ ag.Variable(torch.LongTensor([ a ])) for a in actions_done ] 276 | 277 | for output, act in zip(outputs, action_idxs): 278 | loss += criterion(output.view(-1, 3), act) 279 | 280 | tot_loss += utils.to_scalar(loss.data) 281 | instance_count += 1 282 | 283 | for gold, output in zip(actions_done, outputs): 284 | pred_act = utils.argmax(output.data) 285 | if pred_act == gold: 286 | correct_actions += 1 287 | total_actions += len(outputs) 288 | 289 | loss.backward() 290 | optimizer.step() 291 | 292 | acc = float(correct_actions) / total_actions 293 | loss = float(tot_loss) / instance_count 294 | if verbose: 295 | print "Number of instances: {} Number of network actions: {}".format(instance_count, total_actions) 296 | print "Acc: {} Loss: {}".format(float(correct_actions) / total_actions, tot_loss / instance_count) 297 | 298 | 299 | def evaluate(data, model, verbose=False): 300 | 301 | correct_actions = 0 302 | total_actions = 0 303 | tot_loss = 0. 304 | instance_count = 0 305 | criterion = nn.NLLLoss() 306 | 307 | for sentence, actions in data: 308 | 309 | if len(sentence) > 1: 310 | outputs, _, actions_done = model(sentence, actions) 311 | 312 | loss = ag.Variable(torch.FloatTensor([0])) 313 | action_idxs = [ ag.Variable(torch.LongTensor([ a ])) for a in actions_done ] 314 | for output, act in zip(outputs, action_idxs): 315 | loss += criterion(output.view((-1, 3)), act) 316 | 317 | tot_loss += utils.to_scalar(loss.data) 318 | instance_count += 1 319 | 320 | for gold, output in zip(actions_done, outputs): 321 | pred_act = utils.argmax(output.data) 322 | if pred_act == gold: 323 | correct_actions += 1 324 | 325 | total_actions += len(outputs) 326 | 327 | acc = float(correct_actions) / total_actions 328 | loss = float(tot_loss) / instance_count 329 | if verbose: 330 | print "Number of instances: {} Number of network actions: {}".format(instance_count, total_actions) 331 | print "Acc: {} Loss: {}".format(float(correct_actions) / total_actions, tot_loss / instance_count) 332 | return acc, loss 333 | -------------------------------------------------------------------------------- /gtnlplib/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as ag 3 | from gtnlplib.constants import END_OF_INPUT_TOK, HAVE_CUDA 4 | 5 | if HAVE_CUDA: 6 | import torch.cuda as cuda 7 | 8 | 9 | def word_to_variable_embed(word, word_embeddings, word_to_ix): 10 | return word_embeddings(ag.Variable(torch.LongTensor([ word_to_ix[word] ]))) 11 | 12 | 13 | def sequence_to_variable(sequence, to_ix, use_cuda=False): 14 | if use_cuda: 15 | return ag.Variable( cuda.LongTensor([ to_ix[t] for t in sequence ]) ) 16 | else: 17 | return ag.Variable( torch.LongTensor([ to_ix[t] for t in sequence ]) ) 18 | 19 | 20 | def to_scalar(var): 21 | """ 22 | Wrap up the terse, obnoxious code to go from torch.Tensor to 23 | a python int / float (is there a better way?) 24 | """ 25 | if isinstance(var, ag.Variable): 26 | return var.data.view(-1).tolist()[0] 27 | else: 28 | return var.view(-1).tolist()[0] 29 | 30 | 31 | def argmax(vector): 32 | """ 33 | Takes in a row vector (1xn) and returns its argmax 34 | """ 35 | _, idx = torch.max(vector, 1) 36 | return to_scalar(idx) 37 | 38 | 39 | def concat_and_flatten(items): 40 | """ 41 | Concatenate feature vectors together in a way that they can be handed into 42 | a linear layer 43 | :param items A list of ag.Variables which are vectors 44 | :return One long row vector of all of the items concatenated together 45 | """ 46 | return torch.cat(items, 1).view(1, -1) 47 | 48 | 49 | def initialize_with_pretrained(pretrained_embeds, word_embedding_component): 50 | """ 51 | Initialize the embedding lookup table of word_embedding_component with the embeddings 52 | from pretrained_embeds. 53 | Remember that word_embedding_component has a word_to_ix member you will have to use. 54 | For every word that we do not have a pretrained embedding for, keep the default initialization. 55 | :param pretrained_embeds dict mapping word to python list of floats (the embedding 56 | of that word) 57 | :param word_embedding_component The network component to initialize (i.e, a VanillaWordEmbeddingLookup 58 | or BiLSTMWordEmbeddingLookup) 59 | """ 60 | # STUDENT 61 | for word, index in word_embedding_component.word_to_ix.items(): 62 | if word in pretrained_embeds: 63 | word_embedding_component.word_embeddings.weight.data[index] = torch.Tensor(pretrained_embeds[word]) 64 | # END STUDENT 65 | 66 | 67 | # ===----------------------------------------------------------------=== 68 | # Dummy classes that let us test parsing logic without having the 69 | # necessary components implemented yet 70 | # ===----------------------------------------------------------------=== 71 | class DummyCombiner: 72 | 73 | def __call__(self, head, modifier): 74 | return head 75 | 76 | 77 | class DummyActionChooser: 78 | 79 | def __init__(self): 80 | self.counter = 0 81 | 82 | def __call__(self, inputs): 83 | self.counter += 1 84 | return ag.Variable(torch.Tensor([0., 0., 1.])) 85 | 86 | 87 | class DummyWordEmbeddingLookup: 88 | 89 | def __init__(self): 90 | self.word_embeddings = lambda x: None 91 | self.counter = 0 92 | 93 | def __call__(self, sentence): 94 | self.counter += 1 95 | return [None]*len(sentence) 96 | 97 | 98 | class DummyFeatureExtractor: 99 | 100 | def __init__(self): 101 | self.counter = 0 102 | 103 | def get_features(self, parser_state): 104 | self.counter += 1 105 | return [] 106 | -------------------------------------------------------------------------------- /make-submission.sh: -------------------------------------------------------------------------------- 1 | # To run, type the command: 2 | # sh make-submission.sh 3 | 4 | SUBMIT_FILES="gtnlplib/ tests/ pset4.ipynb text-answers.md *.preds" 5 | TARBALL_NAME="problemset4.tar.gz" 6 | tar zvcf ${TARBALL_NAME} ${SUBMIT_FILES} 7 | echo "===---------------------------------------------------------------===" 8 | echo "READ THIS" 9 | echo "===---------------------------------------------------------------===" 10 | echo "Submit ${TARBALL_NAME} on T-Square. DO NOT MODIFY THE FILE NAME" 11 | echo "CHECK BEFORE SUBMITTING THAT IT CONTAINS EVERYTHING YOU NEED" 12 | echo "Run the command sequence:" 13 | echo "$ mkdir temp" 14 | echo "$ cd temp" 15 | echo "$ cp ../${TARBALL_NAME} ." 16 | echo "$ tar xvzf ${TARBALL_NAME}" 17 | echo "And check that your code, ipython notebook, text-answers.md, *.preds files are unzipped" 18 | echo "////////////////////////////////////////////////////////////////////////" 19 | -------------------------------------------------------------------------------- /pset4.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Deep Transition Dependency Parser in PyTorch\n", 8 | "\n", 9 | "In this problem set, you will implement a deep transition dependency parser in [PyTorch](https://pytorch.org). PyTorch is a popular deep learning framework providing a variety of components for constructing neural networks. You will see how more complicated network architectures than simple feed-forward networks that you have learned in earlier classes can be used to solve a structured prediction problem." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": { 16 | "collapsed": false 17 | }, 18 | "outputs": [], 19 | "source": [ 20 | "import gtnlplib.parsing as parsing\n", 21 | "import gtnlplib.data_tools as data_tools\n", 22 | "import gtnlplib.constants as consts\n", 23 | "import gtnlplib.evaluation as evaluation\n", 24 | "import gtnlplib.utils as utils\n", 25 | "import gtnlplib.feat_extractors as feat_extractors\n", 26 | "import gtnlplib.neural_net as neural_net\n", 27 | "\n", 28 | "import torch\n", 29 | "import torch.optim as optim\n", 30 | "import torch.nn as nn\n", 31 | "import torch.nn.functional as F\n", 32 | "import torch.autograd as ag\n", 33 | "\n", 34 | "from collections import defaultdict" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": { 41 | "collapsed": false 42 | }, 43 | "outputs": [], 44 | "source": [ 45 | "# Read in the dataset\n", 46 | "dataset = data_tools.Dataset(consts.TRAIN_FILE, consts.DEV_FILE, consts.TEST_FILE)\n", 47 | "\n", 48 | "# Assign each word a unique index, including the two special tokens\n", 49 | "word_to_ix = { word: i for i, word in enumerate(dataset.vocab) }" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 3, 55 | "metadata": { 56 | "collapsed": true 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "# Some constants to keep around\n", 61 | "LSTM_NUM_LAYERS = 1\n", 62 | "TEST_EMBEDDING_DIM = 5\n", 63 | "WORD_EMBEDDING_DIM = 64\n", 64 | "STACK_EMBEDDING_DIM = 100\n", 65 | "NUM_FEATURES = 3\n", 66 | "\n", 67 | "# Hyperparameters\n", 68 | "ETA_0 = 0.01\n", 69 | "DROPOUT = 0.0" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 4, 75 | "metadata": { 76 | "collapsed": true 77 | }, 78 | "outputs": [], 79 | "source": [ 80 | "def make_dummy_parser_state(sentence):\n", 81 | " dummy_embeds = [ w + \"-EMBEDDING\" for w in sentence ] + [consts.END_OF_INPUT_TOK + \"-EMBEDDING\"]\n", 82 | " return parsing.ParserState(sentence + [consts.END_OF_INPUT_TOK], dummy_embeds, utils.DummyCombiner())" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "# High-Level Overview of the Parser\n", 90 | "Be sure that you have reviewed the notes on transition-based dependency parsing, and are familiar with the relevant terminology. One small difference is that the text describes `arc-left` and `arc-right` actions, which create arcs between the top of the stack and the front of the buffer; in contrast, the parser you will implement here uses `reduce-left` and `reduce-right` actions, which create arcs between the top two items on the stack.\n", 91 | "\n", 92 | "Parsing will proceed as follows:\n", 93 | "* Initialize your parsing stack and input buffer.\n", 94 | "* At each step, extract some features. These can be anything: words in the sentence, the configuration of the stack, the configuration of the input buffer, the previous action, etc.\n", 95 | "* Send these features through a multi-layer perceptron (MLP) to get a probability distribution over actions (SHIFT, REDUCE_L, REDUCE_R). The next action you choose is the one with the highest probability.\n", 96 | "* If the action is either reduce left or reduce right, you use a neural network to combine the items being reduced and get a dense output to place back on the stack.\n", 97 | "\n", 98 | "The key classes you will fill in code for are\n", 99 | "* Feature extraction in `feat_extractors.py`\n", 100 | "* The `ParserState` class, which keeps track of the input buffer and parse stack, and offers a public interface for doing the parsing actions to update the state\n", 101 | "* The `TransitionParser` class, which is a PyTorch module where the core parsing logic resides in `parsing.py`.\n", 102 | "* The neural network components in `neural_net.py`\n", 103 | "\n", 104 | "The network components are compartmentalized as follows:\n", 105 | "* **Parsing**: `TransitionParser` is the base component that contains and coordinates the other substitutable components.\n", 106 | "* **Embedding Lookup**: You will implement two flavors of getting embeddings. These embeddings are used to initialize the input buffer, and will be shifted on the stack / serve as inputs to the combiner networks (explained below).\n", 107 | " - `VanillaWordEmbeddingLookup` just gets embeddings from a lookup table, one per word in the sentence.\n", 108 | " - `BiLSTMWordEmbeddingLookup` is more fancy, running a sequence model in both directions over the sentence. The hidden state at step t is the embedding for the t'th word of the sentence. \n", 109 | "* **Action Choosing**: This is a simple multilayer perceptron (MLP) that outputs log probabilities over actions\n", 110 | "* **Combiners**: These networks take the two embeddings of the items being reduced, and combine them into a single embedding. You will create two version of this:\n", 111 | " - `MLPCombinerNetwork` takes the two input embeddings and gives a dense output\n", 112 | " - `LSTMCombinerNetwork` does a sequence model, where the output embedding is the hidden state of the next timestep." 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "## Example\n", 120 | "\n", 121 | "The following is how the input buffer and stack look at each step of a parse, up to the first reduction. The input sentence is \"the dog ran away\". Our action chooser network takes the top two elements of the stack plus a one-token lookahead in the input buffer. $C(x,y)$ refers to calling our combiner network on arguments $x, y$. Also let $A$ be the set of actions: $\\{ \\text{SHIFT}, \\text{REDUCE-L}, \\text{REDUCE-R} \\}$, and let $q_w$ be the embedding for word $w$.\n", 122 | "\n", 123 | "### Step 1. \n", 124 | " * Input Buffer: $\\left[ q_\\text{the}, q_\\text{dog}, q_\\text{ran}, q_\\text{away}, q_\\text{END-INPUT} \\right]$\n", 125 | " * Stack: $\\left[ q_\\text{NULL-STACK}, q_\\text{NULL-STACK} \\right]$\n", 126 | " * Action: $ \\text{argmax}_{a \\in A} \\ \\text{ActionChooser}(q_\\text{NULL-STACK}, q_\\text{NULL-STACK}, \\overbrace{q_\\text{the}}^\\text{lookahead}) \\Rightarrow \\text{SHIFT}$\n", 127 | "\n", 128 | "### Step 2\n", 129 | " * Input Buffer: $\\left[ q_\\text{dog}, q_\\text{ran}, q_\\text{away}, q_\\text{END-INPUT} \\right]$\n", 130 | " * Stack: $\\left[ q_\\text{NULL-STACK}, q_\\text{NULL-STACK}, q_\\text{the} \\right]$\n", 131 | " * Action: $ \\text{argmax}_{a \\in A} \\ \\text{ActionChooser}(q_\\text{NULL-STACK}, q_\\text{the}, q_\\text{dog}) \\Rightarrow \\text{SHIFT}$\n", 132 | "\n", 133 | "### Step 3\n", 134 | " * Input Buffer: $\\left[ q_\\text{ran}, q_\\text{away}, q_\\text{END-INPUT} \\right]$\n", 135 | " * Stack: $\\left[ q_\\text{NULL-STACK}, q_\\text{NULL-STACK}, q_\\text{the}, q_\\text{dog} \\right]$\n", 136 | " * Action: $ \\text{argmax}_{a \\in A} \\ \\text{ActionChooser}(q_\\text{the}, q_\\text{dog}, q_\\text{ran}) \\Rightarrow \\text{REDUCE-L}$\n", 137 | "\n", 138 | "### Step 4\n", 139 | " * Input Buffer: $\\left[ q_\\text{ran}, q_\\text{away}, q_\\text{END-INPUT} \\right]$\n", 140 | " * Stack: $\\left[ q_\\text{NULL-STACK}, q_\\text{NULL-STACK}, C(q_\\text{dog}, q_\\text{the}) \\right]$\n", 141 | " \n", 142 | "For each word $w_m$, the parser keeps track of: the embedding $q_{w_m}$, the word itself $w_m$, and the word's position in the sentence $m$. The combination action should store the word and the index for the head word in the relation. The combined embedding may be a function of the embeddings for both the head and modifier words.\n", 143 | "\n", 144 | "Before beginning, I recommend completing the parse, drawing the input buffer and stack at each step, and explicity listing the arguments to the action chooser." 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "metadata": {}, 150 | "source": [ 151 | "# 1. Managing and Updating the Parser State (1.5 points)\n", 152 | "\n", 153 | "In this part of the assignment, you will work with the ParserState class, that keeps track of the parsers input buffer and stack." 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "### Deliverable 1.1: Implementing Reduce (1 point)\n", 161 | "Implement the reduction operation of the ParserState in parsing.py, in the function \\_reduce.\n", 162 | "\n", 163 | "The way reduction is done is slightly different from the notes. In the notes, reduction takes place between the top element of the stack and the first element of the input buffer. Here, reduction takes place between the top two elements of the stack.\n", 164 | "\n", 165 | "At this step, there are no embeddings, but don't forget to make the call to the combiner network component.\n", 166 | "\n", 167 | "Hints:\n", 168 | "* Before starting, read the comments in \\_reduce, and look at the \\_\\_init\\_\\_ function of ParserState to see how it represents the stack and input buffer.\n", 169 | "* The `StackEntry` and `DepGraphEdge` tuples will be part of your solution, so take a look at how these are used elsewhere in the source.\n", 170 | "* In particular, you will want to push a new `StackEntry` onto the stack, and return a `DepGraphEdge`.\n", 171 | "* If you have trouble understanding the representation, print parser_state.stack or parser_state.input_buffer directly. (If you just print parser_state, it will output a pretty-printed version)." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 6, 177 | "metadata": { 178 | "collapsed": false 179 | }, 180 | "outputs": [ 181 | { 182 | "name": "stdout", 183 | "output_type": "stream", 184 | "text": [ 185 | "Stack: []\n", 186 | "Input Buffer: ['They', 'can', 'fish', '']\n", 187 | "\n", 188 | "Stack: ['They', 'can']\n", 189 | "Input Buffer: ['fish', '']\n", 190 | "\n", 191 | "Reduction Made Edge: Head: ('can', 1), Modifier: ('They', 0) \n", 192 | "\n", 193 | "Stack: ['can']\n", 194 | "Input Buffer: ['fish', '']\n", 195 | "\n" 196 | ] 197 | } 198 | ], 199 | "source": [ 200 | "test_sentence = \"They can fish\".split()+[consts.END_OF_INPUT_TOK]\n", 201 | "parser_state = parsing.ParserState(test_sentence, [None] * len(test_sentence), utils.DummyCombiner())\n", 202 | "\n", 203 | "print parser_state\n", 204 | "\n", 205 | "parser_state.shift()\n", 206 | "parser_state.shift()\n", 207 | "print parser_state\n", 208 | "\n", 209 | "reduction = parser_state.reduce_left()\n", 210 | "print \"Reduction Made Edge: Head: {}, Modifier: {}\".format(reduction[0], reduction[1]), \"\\n\"\n", 211 | "print parser_state" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "### Deliverable 1.2: Parser Terminating Condition (0.5 points)\n", 219 | "In this short (one line) deliverable, implement done_parsing() in ParserState. Note\n", 220 | "we add an END_INPUT_TOKEN to the end of the sentence (this token could be a helpful feature). Think about what the input buffer and stack look like at the end of a parse." 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 7, 226 | "metadata": { 227 | "collapsed": false 228 | }, 229 | "outputs": [ 230 | { 231 | "name": "stdout", 232 | "output_type": "stream", 233 | "text": [ 234 | "Stack: ['can']\n", 235 | "Input Buffer: ['fish', '']\n", 236 | " False \n", 237 | "\n", 238 | "Stack: ['can', 'fish']\n", 239 | "Input Buffer: ['']\n", 240 | " False \n", 241 | "\n", 242 | "Stack: ['can']\n", 243 | "Input Buffer: ['']\n", 244 | " True \n", 245 | "\n" 246 | ] 247 | } 248 | ], 249 | "source": [ 250 | "print parser_state, parser_state.done_parsing(),'\\n'\n", 251 | "parser_state.shift()\n", 252 | "print parser_state, parser_state.done_parsing(),'\\n'\n", 253 | "parser_state.reduce_right()\n", 254 | "print parser_state, parser_state.done_parsing(),'\\n'" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "# 2. Neural Network for Action Decisions (3.5 points)\n", 262 | "In this part of the assignment, you will use PyTorch to create a neural network which examines the current state of the parse and makes the decision to either shift, reduce left, or reduce right." 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": {}, 268 | "source": [ 269 | "### Deliverable 2.1: Word Embedding Lookup (1 point)\n", 270 | "Implement the class `VanillaWordEmbeddingLookup` in `neural_net.py`.\n", 271 | "\n", 272 | "This involves adding code to the `__init__` and `forward` methods. \n", 273 | "- In the `__init__` method, you want make sure that instances of the class can store the embeddings\n", 274 | "- In the `forward` method, you should return a list of Torch variables, representing the looked up embeddings for each word in the sequence \n", 275 | "\n", 276 | "If you didn't do the tutorial, you will want to read the [docs](http://pytorch.org/docs/nn.html#embedding) on how to create a lookup table for your word embeddings.\n", 277 | "\n", 278 | "Hint: You will have to turn the input, which is a list of strings (the words in the sentence), into a format that your embedding lookup table can take, which is a torch.LongTensor. So that we can automatically backprop, it is wrapped in a Variable. utils.sequence_to_variable takes care of this for you." 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 52, 284 | "metadata": { 285 | "collapsed": false 286 | }, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "\n", 293 | "2 \n", 294 | "\n", 295 | "Embedding for William:\n", 296 | " Variable containing:\n", 297 | "-2.9718 1.7070 -0.4305 -2.2820 0.5237\n", 298 | "[torch.FloatTensor of size 1x5]\n", 299 | "\n" 300 | ] 301 | } 302 | ], 303 | "source": [ 304 | "torch.manual_seed(1) # DO NOT CHANGE\n", 305 | "reload(neural_net)\n", 306 | "test_sentence = \"William Faulkner\".split()\n", 307 | "test_word_to_ix = { \"William\": 0, \"Faulkner\": 1 }\n", 308 | "\n", 309 | "word_embedder = neural_net.VanillaWordEmbeddingLookup(test_word_to_ix, TEST_EMBEDDING_DIM)\n", 310 | "embeds = word_embedder(test_sentence)\n", 311 | "print type(embeds)\n", 312 | "print len(embeds), \"\\n\"\n", 313 | "print \"Embedding for William:\\n {}\".format(embeds[0])" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "### Deliverable 2.2: Feature Extraction (0.5 points)\n", 321 | "Fill in the SimpleFeatureExtractor class in feat_extractors.py to give the following 3 features\n", 322 | "* The embedding of the 2nd to top of the stack\n", 323 | "* The embedding of the top of the stack\n", 324 | "* The embedding of the next token in the input buffer (one-token lookahead)\n", 325 | "\n", 326 | "If at this point you have not poked around ParserState to see how it stores the state, now would be a good time." 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 53, 332 | "metadata": { 333 | "collapsed": false 334 | }, 335 | "outputs": [ 336 | { 337 | "name": "stdout", 338 | "output_type": "stream", 339 | "text": [ 340 | "Embedding for 'The':\n", 341 | " Variable containing:\n", 342 | " 0.8407 0.5510 0.3863 0.9124 -0.8410\n", 343 | "[torch.FloatTensor of size 1x5]\n", 344 | "\n", 345 | "Embedding for 'Sound':\n", 346 | " Variable containing:\n", 347 | "-2.9718 1.7070 -0.4305 -2.2820 0.5237\n", 348 | "[torch.FloatTensor of size 1x5]\n", 349 | "\n", 350 | "Embedding for 'and' (from buffer lookahead):\n", 351 | " Variable containing:\n", 352 | " 0.0004 -1.2039 3.5283 0.4434 0.5848\n", 353 | "[torch.FloatTensor of size 1x5]\n", 354 | "\n" 355 | ] 356 | } 357 | ], 358 | "source": [ 359 | "torch.manual_seed(1)\n", 360 | "test_sentence = \"The Sound and the Fury\".split()\n", 361 | "test_word_to_ix = { word: i for i, word in enumerate(set(test_sentence)) }\n", 362 | "\n", 363 | "embedder = neural_net.VanillaWordEmbeddingLookup(test_word_to_ix, TEST_EMBEDDING_DIM)\n", 364 | "embeds = embedder(test_sentence)\n", 365 | "\n", 366 | "state = parsing.ParserState(test_sentence, embeds, utils.DummyCombiner())\n", 367 | "\n", 368 | "state.shift()\n", 369 | "state.shift()\n", 370 | "feat_extractor = feat_extractors.SimpleFeatureExtractor()\n", 371 | "feats = feat_extractor.get_features(state)\n", 372 | "\n", 373 | "print \"Embedding for 'The':\\n {}\".format(feats[0])\n", 374 | "print \"Embedding for 'Sound':\\n {}\".format(feats[1])\n", 375 | "print \"Embedding for 'and' (from buffer lookahead):\\n {}\".format(feats[2])" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "### Deliverable 2.3: MLP for Choosing Actions (1 point)\n", 383 | "\n", 384 | "Implement the class `neural_net.ActionChooserNetwork` according to the specification in `neural_net.py`.\n", 385 | "\n", 386 | "You will want to use the `utils.concat_and_flatten` function. We provide this function because the Tensor reshaping code can get somewhat terse. It takes the list of embeddings passed in (that come from your feature extractor) and concatenates them to one long row vector.\n", 387 | "\n", 388 | "This network takes as input the features from your feature extractor, concatenates them, runs them through an MLP and outputs log probabilities over actions.\n", 389 | "\n", 390 | "Hint:\n", 391 | "\n", 392 | "- http://pytorch.org/docs/nn.html#non-linear-activations\n", 393 | "- http://pytorch.org/docs/nn.html#linear-layers" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 54, 399 | "metadata": { 400 | "collapsed": false 401 | }, 402 | "outputs": [ 403 | { 404 | "name": "stdout", 405 | "output_type": "stream", 406 | "text": [ 407 | "Variable containing:\n", 408 | "-1.5347 -1.3445 -0.6466\n", 409 | "[torch.FloatTensor of size 1x3]\n", 410 | "\n" 411 | ] 412 | } 413 | ], 414 | "source": [ 415 | "torch.manual_seed(1) # DO NOT CHANGE, you can compare my output below to yours\n", 416 | "act_chooser = neural_net.ActionChooserNetwork(TEST_EMBEDDING_DIM * NUM_FEATURES)\n", 417 | "feats = [ ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM)) for _ in xrange(NUM_FEATURES) ] # make some dummy feature embeddings\n", 418 | "log_probs = act_chooser(feats)\n", 419 | "print log_probs" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "### Deliverable 2.4: Network for Combining Stack Items (1 point)\n", 427 | "Implement the class `neural_net.MLPCombinerNetwork` according to the specification in `neural_net.py`.\n", 428 | "Again, `utils.concat_and_flatten` will come in handy.\n", 429 | "\n", 430 | "Recall that what this component does is take two embeddings, the head and modifier, during a reduction and output a combined embedding, which is then pushed back onto the stack during parsing." 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": 55, 436 | "metadata": { 437 | "collapsed": false 438 | }, 439 | "outputs": [ 440 | { 441 | "name": "stdout", 442 | "output_type": "stream", 443 | "text": [ 444 | "Variable containing:\n", 445 | " 0.6063 -0.0110 0.6530 -0.6196 -0.1051\n", 446 | "[torch.FloatTensor of size 1x5]\n", 447 | "\n" 448 | ] 449 | } 450 | ], 451 | "source": [ 452 | "torch.manual_seed(1) # DO NOT CHANGE\n", 453 | "combiner = neural_net.MLPCombinerNetwork(TEST_EMBEDDING_DIM)\n", 454 | "\n", 455 | "# Again, make dummy inputs\n", 456 | "head_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))\n", 457 | "modifier_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))\n", 458 | "combined = combiner(head_feat, modifier_feat)\n", 459 | "print combined" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "# 3. Return of the Parser (2 points)" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "### Deliverable 3.1: Parser Training Code (1.5 points)\n", 474 | "\n", 475 | "**Note:** There are two unit tests for this deliverable, one worth 1 point, one worth 0.5.\n", 476 | "\n", 477 | "You will implement the forward() function in gtnlplib.parsing.TransitionParser.\n", 478 | "It is important to understand the difference between the following tasks:\n", 479 | "\n", 480 | "* Training: Training the model involves passing it sentences along with the correct sequence of actions, and updating weights.\n", 481 | "* Evaluation: We can evaluate the parser by passing it sentences along with the correct sequence of actions, and see how many actions it predicts correctly. This is identical to training, except the weights are not updated after making a prediction.\n", 482 | "* Prediction: After setting the weights, we give it a raw sentence (no gold-standard actions) and ask it for the correct dependency graph.\n", 483 | "\n", 484 | "At this point, it is necessary to have all of the components in place for constructing the parser.\n", 485 | "\n", 486 | "The parsing logic is roughly as follows:\n", 487 | "* Loop until parsing state is in its terminating state (deliverable 1.2)\n", 488 | "* Get the features from the parsing state (deliverable 2.1)\n", 489 | "* Send them through your action chooser network to get log probabilities over actions (deliverable 2.3)\n", 490 | "* If you have `gold_actions`, do that. Otherwise (when predicting), take the argmax of your log probabilities and do that.\n", 491 | " - Argmax is gross in PyTorch, so a function is provided for you in utils.argmax.\n", 492 | " - While the gold actions will always be valid, if you are not provided gold actions, you must make sure\n", 493 | " that any action you do is legal. You cannot shift when the input buffer contains only `END_OF_INPUT_TOK` (this token should *NOT* be shifted onto the stack) and you cannot reduce when the stack contains fewer than 2 elements.\n", 494 | " **If your network chooses `SHIFT` when it is not legal, just do `REDUCE_R`**\n", 495 | "\n", 496 | "Make sure to keep track of the things that the function wants to keep track of\n", 497 | "* Do all of your actions by calling the appropriate function on your `parser_state`\n", 498 | "* Append each output `Variable` from your `action_chooser` to the outputs list\n", 499 | "* Append each action you do to `actions_done`" 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": 8, 505 | "metadata": { 506 | "collapsed": true 507 | }, 508 | "outputs": [], 509 | "source": [ 510 | "test_sentence = \"The man ran away\".split()\n", 511 | "test_word_to_ix = { word: i for i, word in enumerate(set(test_sentence)) }\n", 512 | "test_word_to_ix[consts.END_OF_INPUT_TOK] = len(test_word_to_ix)\n", 513 | "test_sentence_vocab = set(test_sentence)\n", 514 | "gold_actions = [\"SHIFT\", \"SHIFT\", \"REDUCE_L\", \"SHIFT\", \"REDUCE_L\", \"SHIFT\", \"REDUCE_R\"]" 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": 9, 520 | "metadata": { 521 | "collapsed": false 522 | }, 523 | "outputs": [], 524 | "source": [ 525 | "feat_extractor = feat_extractors.SimpleFeatureExtractor()\n", 526 | "word_embedding_lookup = neural_net.VanillaWordEmbeddingLookup(test_word_to_ix, STACK_EMBEDDING_DIM)\n", 527 | "action_chooser = neural_net.ActionChooserNetwork(STACK_EMBEDDING_DIM * NUM_FEATURES)\n", 528 | "combiner_network = neural_net.MLPCombinerNetwork(STACK_EMBEDDING_DIM)\n", 529 | "parser = parsing.TransitionParser(feat_extractor, word_embedding_lookup,\n", 530 | " action_chooser, combiner_network)" 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": 10, 536 | "metadata": { 537 | "collapsed": false 538 | }, 539 | "outputs": [ 540 | { 541 | "name": "stdout", 542 | "output_type": "stream", 543 | "text": [ 544 | "set([DepGraphEdge(head=('ran', 2), modifier=('away', 3)), DepGraphEdge(head=('ran', 2), modifier=('man', 1)), DepGraphEdge(head=('', -1), modifier=('ran', 2)), DepGraphEdge(head=('man', 1), modifier=('The', 0))])\n", 545 | "[0, 0, 1, 0, 1, 0, 2]\n" 546 | ] 547 | } 548 | ], 549 | "source": [ 550 | "output, depgraph, actions_done = parser(test_sentence, gold_actions)\n", 551 | "print depgraph\n", 552 | "print actions_done" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "### Now Train the Parser!" 560 | ] 561 | }, 562 | { 563 | "cell_type": "markdown", 564 | "metadata": {}, 565 | "source": [ 566 | "Training your parser may take some time. On the test below, I get about 5 seconds per loop (i7 6700k). \n", 567 | "\n", 568 | "- There are 10,000 training sentences, so multiply this measurement by 100 to get your training time.\n", 569 | "- One optimization trick is to that if you can do several things with a single PyTorch call, this will probably be faster than writing a PyTorch call that does one thing, and then calling it several times. " 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": 11, 575 | "metadata": { 576 | "collapsed": false 577 | }, 578 | "outputs": [], 579 | "source": [ 580 | "torch.manual_seed(1)\n", 581 | "feat_extractor = feat_extractors.SimpleFeatureExtractor()\n", 582 | "word_embedding_lookup = neural_net.VanillaWordEmbeddingLookup(word_to_ix, STACK_EMBEDDING_DIM)\n", 583 | "action_chooser = neural_net.ActionChooserNetwork(STACK_EMBEDDING_DIM * NUM_FEATURES)\n", 584 | "combiner_network = neural_net.MLPCombinerNetwork(STACK_EMBEDDING_DIM)\n", 585 | "parser = parsing.TransitionParser(feat_extractor, word_embedding_lookup,\n", 586 | " action_chooser, combiner_network)\n", 587 | "optimizer = optim.SGD(parser.parameters(), lr=ETA_0)" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 12, 593 | "metadata": { 594 | "collapsed": false 595 | }, 596 | "outputs": [ 597 | { 598 | "name": "stdout", 599 | "output_type": "stream", 600 | "text": [ 601 | "Number of instances: 100 Number of network actions: 3898\n", 602 | "Acc: 0.687275525911 Loss: 27.0280368376\n", 603 | "Number of instances: 100 Number of network actions: 3898\n", 604 | "Acc: 0.837609030272 Loss: 15.3356624376\n", 605 | "Number of instances: 100 Number of network actions: 3898\n", 606 | "Acc: 0.91713699333 Loss: 8.99367914472\n", 607 | "Number of instances: 100 Number of network actions: 3898\n", 608 | "Acc: 0.956900974859 Loss: 4.93015527138\n", 609 | "1 loop, best of 3: 3.96 s per loop\n" 610 | ] 611 | } 612 | ], 613 | "source": [ 614 | "%%timeit\n", 615 | "parsing.train(dataset.training_data[:100], parser, optimizer)" 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": 15, 621 | "metadata": { 622 | "collapsed": false 623 | }, 624 | "outputs": [ 625 | { 626 | "data": { 627 | "text/plain": [ 628 | "{DepGraphEdge(head=('', -1), modifier=('restrict', 4)),\n", 629 | " DepGraphEdge(head=('RTC', 6), modifier=('the', 5)),\n", 630 | " DepGraphEdge(head=('Treasury', 8), modifier=(',', 11)),\n", 631 | " DepGraphEdge(head=('Treasury', 8), modifier=('RTC', 6)),\n", 632 | " DepGraphEdge(head=('Treasury', 8), modifier=('only', 10)),\n", 633 | " DepGraphEdge(head=('Treasury', 8), modifier=('to', 7)),\n", 634 | " DepGraphEdge(head=('agency', 14), modifier=('the', 13)),\n", 635 | " DepGraphEdge(head=('authorization', 18), modifier=('congressional', 17)),\n", 636 | " DepGraphEdge(head=('authorization', 18), modifier=('specific', 16)),\n", 637 | " DepGraphEdge(head=('intends', 2), modifier=('The', 0)),\n", 638 | " DepGraphEdge(head=('intends', 2), modifier=('bill', 1)),\n", 639 | " DepGraphEdge(head=('only', 10), modifier=('borrowings', 9)),\n", 640 | " DepGraphEdge(head=('receives', 15), modifier=('agency', 14)),\n", 641 | " DepGraphEdge(head=('receives', 15), modifier=('authorization', 18)),\n", 642 | " DepGraphEdge(head=('restrict', 4), modifier=('.', 19)),\n", 643 | " DepGraphEdge(head=('restrict', 4), modifier=('intends', 2)),\n", 644 | " DepGraphEdge(head=('restrict', 4), modifier=('to', 3)),\n", 645 | " DepGraphEdge(head=('restrict', 4), modifier=('unless', 12)),\n", 646 | " DepGraphEdge(head=('unless', 12), modifier=('Treasury', 8)),\n", 647 | " DepGraphEdge(head=('unless', 12), modifier=('receives', 15))}" 648 | ] 649 | }, 650 | "execution_count": 15, 651 | "metadata": {}, 652 | "output_type": "execute_result" 653 | } 654 | ], 655 | "source": [ 656 | "# if this call doesn't work, something is wrong with your parser's behavior when gold labels aren't provided\n", 657 | "parser.predict(dataset.dev_data[0].sentence)" 658 | ] 659 | }, 660 | { 661 | "cell_type": "code", 662 | "execution_count": 7, 663 | "metadata": { 664 | "collapsed": false 665 | }, 666 | "outputs": [ 667 | { 668 | "name": "stdout", 669 | "output_type": "stream", 670 | "text": [ 671 | "Epoch 1\n", 672 | "Number of instances: 997 Number of network actions: 39025\n", 673 | "Acc: 0.825035233824 Loss: 17.4108023486\n", 674 | "Dev Evaluation\n", 675 | "Number of instances: 399 Number of network actions: 15719\n", 676 | "Acc: 0.823843755964 Loss: 16.9211852149\n", 677 | "F-Score: 0.500566790988\n", 678 | "Attachment Score: 0.486784960913\n", 679 | "\n", 680 | "\n" 681 | ] 682 | } 683 | ], 684 | "source": [ 685 | "# train the thing for a while here.\n", 686 | "# Shouldn't take too long, even on a laptop\n", 687 | "for epoch in xrange(1):\n", 688 | " print \"Epoch {}\".format(epoch+1)\n", 689 | " parsing.train(dataset.training_data[:1000], parser, optimizer, verbose=True)\n", 690 | " \n", 691 | " print \"Dev Evaluation\"\n", 692 | " parsing.evaluate(dataset.dev_data, parser, verbose=True)\n", 693 | " print \"F-Score: {}\".format(evaluation.compute_metric(parser, dataset.dev_data, evaluation.fscore))\n", 694 | " print \"Attachment Score: {}\".format(evaluation.compute_attachment(parser, dataset.dev_data))\n", 695 | " print \"\\n\"" 696 | ] 697 | }, 698 | { 699 | "cell_type": "markdown", 700 | "metadata": {}, 701 | "source": [ 702 | "### Deliverable 3.2: Test Data Predictions (0.5 points 4650, 0.25 points 7650)\n", 703 | "Run the code below to output your predictions on the test data and dev data. You can run the dev test to verify you are correct up to this point. The test data evaluation is for us." 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": 8, 709 | "metadata": { 710 | "collapsed": true 711 | }, 712 | "outputs": [], 713 | "source": [ 714 | "dev_sentences = [ sentence for sentence, _ in dataset.dev_data ]\n", 715 | "evaluation.output_preds(consts.D3_2_DEV_FILENAME, parser, dev_sentences)" 716 | ] 717 | }, 718 | { 719 | "cell_type": "code", 720 | "execution_count": 9, 721 | "metadata": { 722 | "collapsed": false 723 | }, 724 | "outputs": [], 725 | "source": [ 726 | "evaluation.output_preds(consts.D3_2_TEST_FILENAME, parser, dataset.test_data)" 727 | ] 728 | }, 729 | { 730 | "cell_type": "markdown", 731 | "metadata": {}, 732 | "source": [ 733 | "# 4. Evaluation and Training Improvements (3 points)" 734 | ] 735 | }, 736 | { 737 | "cell_type": "markdown", 738 | "metadata": {}, 739 | "source": [ 740 | "### Deliverable 4.1: Better Word Embeddings (1 point 4650, 0.5 points 7650)\n", 741 | "Implement the class BiLSTMWordEmbeddingLookup in neural_net.py.\n", 742 | "This class can replace your VanillaWordEmbeddingLookup.\n", 743 | "This class implements a sequence model over the sentence, where the t'th word's embedding is the hidden state at timestep t.\n", 744 | "This means that, rather than have our embeddings on the stack only include the semantics of a single word, our embeddings will contain information from all parts of the sentence (the LSTM will, in principle, learn what information is relevant)." 745 | ] 746 | }, 747 | { 748 | "cell_type": "code", 749 | "execution_count": 63, 750 | "metadata": { 751 | "collapsed": false 752 | }, 753 | "outputs": [ 754 | { 755 | "name": "stdout", 756 | "output_type": "stream", 757 | "text": [ 758 | "\n", 759 | "2 \n", 760 | "\n", 761 | "Embedding for Michael:\n", 762 | " Variable containing:\n", 763 | "\n", 764 | "Columns 0 to 9 \n", 765 | "-0.0134 -0.0766 -0.0746 0.0530 -0.0202 0.1845 -0.1455 -0.0734 -0.0072 0.0781\n", 766 | "\n", 767 | "Columns 10 to 19 \n", 768 | " 0.0354 -0.0723 0.0160 0.0915 -0.0200 0.1126 0.1395 0.0041 0.0919 0.0251\n", 769 | "\n", 770 | "Columns 20 to 29 \n", 771 | " 0.3126 0.0233 0.1408 0.1407 -0.2879 -0.1591 -0.0579 0.0207 0.0364 -0.3148\n", 772 | "\n", 773 | "Columns 30 to 39 \n", 774 | "-0.4017 0.1126 0.2589 0.0505 -0.1529 -0.0149 0.0705 0.0419 -0.1842 0.1084\n", 775 | "\n", 776 | "Columns 40 to 49 \n", 777 | "-0.1632 -0.0252 -0.0965 -0.0090 0.1427 0.1717 0.1267 -0.0724 0.3383 -0.0991\n", 778 | "\n", 779 | "Columns 50 to 59 \n", 780 | " 0.2505 -0.1585 -0.0338 0.2543 0.1364 0.1747 -0.0128 0.0472 -0.0284 -0.1095\n", 781 | "\n", 782 | "Columns 60 to 69 \n", 783 | "-0.2905 0.1631 0.0890 0.1824 0.0406 0.0039 -0.0506 -0.0266 0.0073 0.1715\n", 784 | "\n", 785 | "Columns 70 to 79 \n", 786 | " 0.0092 -0.3738 -0.0689 0.0460 0.1567 -0.0565 0.1381 0.0503 -0.0933 0.1842\n", 787 | "\n", 788 | "Columns 80 to 89 \n", 789 | "-0.0477 0.1206 0.0543 0.0678 -0.0886 0.0467 -0.2502 0.0426 -0.0566 -0.0431\n", 790 | "\n", 791 | "Columns 90 to 99 \n", 792 | " 0.0637 -0.0667 0.0312 -0.1330 -0.1285 -0.0477 0.0292 -0.1092 -0.0594 0.0528\n", 793 | "[torch.FloatTensor of size 1x100]\n", 794 | "\n" 795 | ] 796 | } 797 | ], 798 | "source": [ 799 | "torch.manual_seed(1) # DO NOT CHANGE\n", 800 | "test_sentence = \"Michael Collins\".split()\n", 801 | "test_word_to_ix = { \"Michael\": 0, \"Collins\": 1 }\n", 802 | "\n", 803 | "lstm_word_embedder = neural_net.BiLSTMWordEmbeddingLookup(test_word_to_ix,\n", 804 | " WORD_EMBEDDING_DIM,\n", 805 | " STACK_EMBEDDING_DIM,\n", 806 | " num_layers=LSTM_NUM_LAYERS,\n", 807 | " dropout=DROPOUT)\n", 808 | " \n", 809 | "lstm_embeds = lstm_word_embedder(test_sentence)\n", 810 | "print type(lstm_embeds)\n", 811 | "print len(lstm_embeds), \"\\n\"\n", 812 | "print \"Embedding for Michael:\\n {}\".format(lstm_embeds[0])" 813 | ] 814 | }, 815 | { 816 | "cell_type": "markdown", 817 | "metadata": {}, 818 | "source": [ 819 | "### Deliverable 4.2: Pretrained Embeddings (0.5 points)\n", 820 | "\n", 821 | "Fill in the function `initialize_with_pretrained` in `utils.py`.\n", 822 | "\n", 823 | "It will take a word embedding lookup component and initialize its lookup table with pretrained embeddings.\n", 824 | "\n", 825 | "Note that you can create a Torch variable from a list of floats using `torch.Tensor()`. Googling for more information about how Torch stores parameters is allowed, I don't think you'll find the exact answer online (corollary: do not post the answer online)." 826 | ] 827 | }, 828 | { 829 | "cell_type": "code", 830 | "execution_count": 21, 831 | "metadata": { 832 | "collapsed": false 833 | }, 834 | "outputs": [ 835 | { 836 | "name": "stdout", 837 | "output_type": "stream", 838 | "text": [ 839 | "[0.12429751455783844, -0.11472601443529129, -0.5684014558792114, -0.396965891122818, 0.22938089072704315]\n" 840 | ] 841 | } 842 | ], 843 | "source": [ 844 | "import cPickle\n", 845 | "pretrained_embeds = cPickle.load(open(consts.PRETRAINED_EMBEDS_FILE))\n", 846 | "print pretrained_embeds['four'][:5]" 847 | ] 848 | }, 849 | { 850 | "cell_type": "code", 851 | "execution_count": 22, 852 | "metadata": { 853 | "collapsed": true 854 | }, 855 | "outputs": [], 856 | "source": [ 857 | "embedder = neural_net.VanillaWordEmbeddingLookup(word_to_ix,64)" 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "execution_count": 23, 863 | "metadata": { 864 | "collapsed": false 865 | }, 866 | "outputs": [ 867 | { 868 | "data": { 869 | "text/plain": [ 870 | "Variable containing:\n", 871 | " 0.6730\n", 872 | "-1.7911\n", 873 | " 1.4701\n", 874 | " 1.5589\n", 875 | "-2.0735\n", 876 | "[torch.FloatTensor of size 5]" 877 | ] 878 | }, 879 | "execution_count": 23, 880 | "metadata": {}, 881 | "output_type": "execute_result" 882 | } 883 | ], 884 | "source": [ 885 | "embedder.forward(['four'])[0][0,:5]" 886 | ] 887 | }, 888 | { 889 | "cell_type": "code", 890 | "execution_count": 24, 891 | "metadata": { 892 | "collapsed": false 893 | }, 894 | "outputs": [ 895 | { 896 | "name": "stdout", 897 | "output_type": "stream", 898 | "text": [ 899 | "Variable containing:\n", 900 | " 0.1243\n", 901 | "-0.1147\n", 902 | "-0.5684\n", 903 | "-0.3970\n", 904 | " 0.2294\n", 905 | "[torch.FloatTensor of size 5]\n", 906 | "\n" 907 | ] 908 | } 909 | ], 910 | "source": [ 911 | "reload(utils);\n", 912 | "utils.initialize_with_pretrained(pretrained_embeds,embedder)\n", 913 | "print embedder.forward(['four'])[0][0,:5]" 914 | ] 915 | }, 916 | { 917 | "cell_type": "markdown", 918 | "metadata": {}, 919 | "source": [ 920 | "### Deliverable 4.3: Better Reduction Combination (1 point)\n", 921 | "\n", 922 | "Before, in order to combine two embeddings during a reduction, we just passed them through an MLP and got a dense output. Now, we will instead use a sequence model of the stack. The combined embedding from a reduction is the next time step of an LSTM. Implement `LSTMCombinerNetwork` in `neural_network.py`." 923 | ] 924 | }, 925 | { 926 | "cell_type": "code", 927 | "execution_count": 25, 928 | "metadata": { 929 | "collapsed": true 930 | }, 931 | "outputs": [], 932 | "source": [ 933 | "reload(neural_net);\n", 934 | "TEST_EMBEDDING_DIM = 5\n", 935 | "combiner = neural_net.LSTMCombinerNetwork(TEST_EMBEDDING_DIM, 1, 0.0)\n", 936 | "head_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))\n", 937 | "modifier_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM))" 938 | ] 939 | }, 940 | { 941 | "cell_type": "code", 942 | "execution_count": 26, 943 | "metadata": { 944 | "collapsed": false 945 | }, 946 | "outputs": [ 947 | { 948 | "data": { 949 | "text/plain": [ 950 | "Variable containing:\n", 951 | "(0 ,.,.) = \n", 952 | "\n", 953 | "Columns 0 to 8 \n", 954 | " -0.7641 -0.2784 0.6002 0.0032 -1.3923 -0.5975 0.2761 1.4585 1.2168\n", 955 | "\n", 956 | "Columns 9 to 9 \n", 957 | " -0.0510\n", 958 | "[torch.FloatTensor of size 1x1x10]" 959 | ] 960 | }, 961 | "execution_count": 26, 962 | "metadata": {}, 963 | "output_type": "execute_result" 964 | } 965 | ], 966 | "source": [ 967 | "utils.concat_and_flatten([head_feat,modifier_feat]).view(1,1,-1)" 968 | ] 969 | }, 970 | { 971 | "cell_type": "code", 972 | "execution_count": 27, 973 | "metadata": { 974 | "collapsed": false 975 | }, 976 | "outputs": [ 977 | { 978 | "name": "stdout", 979 | "output_type": "stream", 980 | "text": [ 981 | "Variable containing:\n", 982 | " 0.2730 0.0426 -0.1535 0.0089 0.0604\n", 983 | "[torch.FloatTensor of size 1x5]\n", 984 | "\n", 985 | "Variable containing:\n", 986 | " 0.3838 0.0589 -0.1849 0.0171 0.0936\n", 987 | "[torch.FloatTensor of size 1x5]\n", 988 | "\n", 989 | "Variable containing:\n", 990 | " 0.4329 0.0679 -0.1930 0.0236 0.1142\n", 991 | "[torch.FloatTensor of size 1x5]\n", 992 | "\n" 993 | ] 994 | } 995 | ], 996 | "source": [ 997 | "# note that the output keeps changing, because of the recurrent update\n", 998 | "for _ in xrange(3):\n", 999 | " print combiner(head_feat,modifier_feat)" 1000 | ] 1001 | }, 1002 | { 1003 | "cell_type": "markdown", 1004 | "metadata": {}, 1005 | "source": [ 1006 | "### Retrain with the new components\n", 1007 | "\n", 1008 | "The code below retrains your parser using all the new components that you just wrote." 1009 | ] 1010 | }, 1011 | { 1012 | "cell_type": "code", 1013 | "execution_count": 16, 1014 | "metadata": { 1015 | "collapsed": false 1016 | }, 1017 | "outputs": [], 1018 | "source": [ 1019 | "feat_extractor = feat_extractors.SimpleFeatureExtractor()\n", 1020 | "\n", 1021 | "# BiLSTM over word embeddings\n", 1022 | "word_embedding_lookup = neural_net.BiLSTMWordEmbeddingLookup(word_to_ix,\n", 1023 | " WORD_EMBEDDING_DIM,\n", 1024 | " STACK_EMBEDDING_DIM,\n", 1025 | " num_layers=LSTM_NUM_LAYERS,\n", 1026 | " dropout=DROPOUT)\n", 1027 | "# pretrained inputs\n", 1028 | "utils.initialize_with_pretrained(pretrained_embeds, word_embedding_lookup)\n", 1029 | "\n", 1030 | "action_chooser = neural_net.ActionChooserNetwork(STACK_EMBEDDING_DIM * NUM_FEATURES)\n", 1031 | "\n", 1032 | "# LSTM reduction operations\n", 1033 | "combiner = neural_net.LSTMCombinerNetwork(STACK_EMBEDDING_DIM,\n", 1034 | " num_layers=LSTM_NUM_LAYERS,\n", 1035 | " dropout=DROPOUT)\n", 1036 | "\n", 1037 | "parser = parsing.TransitionParser(feat_extractor, word_embedding_lookup,\n", 1038 | " action_chooser, combiner)\n", 1039 | "\n", 1040 | "optimizer = optim.SGD(parser.parameters(), lr=ETA_0)" 1041 | ] 1042 | }, 1043 | { 1044 | "cell_type": "code", 1045 | "execution_count": 17, 1046 | "metadata": { 1047 | "collapsed": false 1048 | }, 1049 | "outputs": [ 1050 | { 1051 | "name": "stdout", 1052 | "output_type": "stream", 1053 | "text": [ 1054 | "Number of instances: 100 Number of network actions: 3898\n", 1055 | "Acc: 0.668291431503 Loss: 28.3679159951\n", 1056 | "Number of instances: 100 Number of network actions: 3898\n", 1057 | "Acc: 0.806824012314 Loss: 18.1205399126\n", 1058 | "Number of instances: 100 Number of network actions: 3898\n", 1059 | "Acc: 0.855310415598 Loss: 13.5849320753\n", 1060 | "Number of instances: 100 Number of network actions: 3898\n", 1061 | "Acc: 0.880708055413 Loss: 10.9105038324\n", 1062 | "1 loop, best of 3: 3.9 s per loop\n" 1063 | ] 1064 | } 1065 | ], 1066 | "source": [ 1067 | "%%timeit\n", 1068 | "# The LSTMs will make this take longer\n", 1069 | "parsing.train(dataset.training_data[:100], parser, optimizer)" 1070 | ] 1071 | }, 1072 | { 1073 | "cell_type": "code", 1074 | "execution_count": 18, 1075 | "metadata": { 1076 | "collapsed": false 1077 | }, 1078 | "outputs": [ 1079 | { 1080 | "name": "stdout", 1081 | "output_type": "stream", 1082 | "text": [ 1083 | "Epoch 1\n", 1084 | "Number of instances: 997 Number of network actions: 39025\n", 1085 | "Acc: 0.896809737348 Loss: 10.2798197525\n", 1086 | "Dev Evaluation\n", 1087 | "Number of instances: 399 Number of network actions: 15719\n", 1088 | "Acc: 0.898403206311 Loss: 9.99048229483\n", 1089 | "F-Score: 0.698743236193\n", 1090 | "Attachment Score: 0.691897257724\n", 1091 | "\n", 1092 | "\n" 1093 | ] 1094 | } 1095 | ], 1096 | "source": [ 1097 | "for epoch in xrange(1):\n", 1098 | " print \"Epoch {}\".format(epoch+1)\n", 1099 | " \n", 1100 | " parser.train() # turn on dropout layers if they are there\n", 1101 | " parsing.train(dataset.training_data[:1000], parser, optimizer, verbose=True)\n", 1102 | " \n", 1103 | " print \"Dev Evaluation\"\n", 1104 | " parser.eval() # turn them off for evaluation\n", 1105 | " parsing.evaluate(dataset.dev_data, parser, verbose=True)\n", 1106 | " print \"F-Score: {}\".format(evaluation.compute_metric(parser, dataset.dev_data, evaluation.fscore))\n", 1107 | " print \"Attachment Score: {}\".format(evaluation.compute_attachment(parser, dataset.dev_data))\n", 1108 | " print \"\\n\"" 1109 | ] 1110 | }, 1111 | { 1112 | "cell_type": "markdown", 1113 | "metadata": {}, 1114 | "source": [ 1115 | "### Deliverable 4.4: Test Predictions (0.5 points 4650, 0.25 points 7650)\n", 1116 | "\n", 1117 | "Run the code below to generate test predictions" 1118 | ] 1119 | }, 1120 | { 1121 | "cell_type": "code", 1122 | "execution_count": 19, 1123 | "metadata": { 1124 | "collapsed": true 1125 | }, 1126 | "outputs": [], 1127 | "source": [ 1128 | "dev_sentences = [ sentence for sentence, _ in dataset.dev_data ]\n", 1129 | "evaluation.output_preds(consts.D4_4_DEV_FILENAME, parser, dev_sentences)" 1130 | ] 1131 | }, 1132 | { 1133 | "cell_type": "code", 1134 | "execution_count": 20, 1135 | "metadata": { 1136 | "collapsed": false 1137 | }, 1138 | "outputs": [], 1139 | "source": [ 1140 | "evaluation.output_preds(consts.D4_4_TEST_FILENAME, parser, dataset.test_data)" 1141 | ] 1142 | }, 1143 | { 1144 | "cell_type": "markdown", 1145 | "metadata": {}, 1146 | "source": [ 1147 | "# 5. Bakeoff!\n", 1148 | "\n", 1149 | "**Bakeoff Link**: Please click [here](https://kaggle.com/join/deepdependencyparsinggtnlp) to join the contest.\n", 1150 | "\n", 1151 | "Try to implement new features and tune your network's architecture and hyper parameters to get the best network.\n", 1152 | "Section 3 of [this paper](https://pdfs.semanticscholar.org/55b8/1991fbb025038d98e8c71acf7dc2b78ee5e9.pdfhttps://pdfs.semanticscholar.org/55b8/1991fbb025038d98e8c71acf7dc2b78ee5e9.pdf) may help out with hyper parameter tuning if you are new to neural networks.\n", 1153 | "To get very competitive, it may be necessary to train for a large amount of time (leaving it running overnight should be fine). Here are some suggestions.\n", 1154 | "* Try customizing any of the 3 components (word embeddings, action choosing, combining) in clever ways. You can create new classes that expose the same public interface and use them here (just leave your required ones untouched).\n", 1155 | "* Try new features. Write new classes that expose the same public interface as SimpleFeatureExtractor. Try looking further into stack history, or more input buffer lookahead, or features based on the action sequence. The possibilities are endless.\n", 1156 | "* Tune your hyperparameters. Learning rate is the most important one.\n", 1157 | "* Try different optimizers. torch.optim has a ton of different training algorithms. SGD was used in this pset because it is fast, but it is the most vanilla of them. Trying new ones will almost certainly boost performance\n", 1158 | "* Try adding regularization to your network if you see evidence that it is overfitting\n", 1159 | "* Check out [this book](http://www.deeplearningbook.org/), which is undoubtedly the best deep learning book (and it is free online!) which has great information on regularization, optimization, and different network architectures.\n", 1160 | "\n", 1161 | "From just picking good hyperparameters, I can get near state-of-the-art results with the components you have built.\n", 1162 | "\n", 1163 | "**Extra credit**: \n", 1164 | "- +0.3 if you beat the best TA/prof system.\n", 1165 | "- +0.2 if you are #1 in CS4650\n", 1166 | "- +0.2 if you are #1 in CS7650" 1167 | ] 1168 | }, 1169 | { 1170 | "cell_type": "markdown", 1171 | "metadata": {}, 1172 | "source": [ 1173 | "# 6. 7650 only: comparing hyperparameters (1 point)\n", 1174 | "\n", 1175 | "Do a systematic comparison of one hyperparameter: could be input embedding size, stack embedding size, learning rate, dropout, or something you added for the bakeoff. Try at least five different values, using either your system from 4.4 or from the bakeoff. Explain what you tried and what you found in text-answers.md." 1176 | ] 1177 | }, 1178 | { 1179 | "cell_type": "markdown", 1180 | "metadata": {}, 1181 | "source": [ 1182 | "### Using Cuda\n", 1183 | "You can use CUDA to train your network, and you should expect significant speedup if you have a decent GPU and the CUDA toolkit installed.\n", 1184 | "If you want to use CUDA in this assignment, change the HAVE_CUDA variable to True in constants.py, and uncomment the .to_cuda() and .to_cpu() lines below.\n", 1185 | "\n", 1186 | "We are not officially supporting CUDA though. If you have problems installing or running CUDA, please just use the CPU, we cannot help you debug it." 1187 | ] 1188 | }, 1189 | { 1190 | "cell_type": "code", 1191 | "execution_count": null, 1192 | "metadata": { 1193 | "collapsed": true 1194 | }, 1195 | "outputs": [], 1196 | "source": [ 1197 | "# Set your hyperparameters here\n", 1198 | "# e.g learning rate, regularization, lr annealing, dimensionality of embeddings, number of epochs, early stopping etc." 1199 | ] 1200 | }, 1201 | { 1202 | "cell_type": "code", 1203 | "execution_count": null, 1204 | "metadata": { 1205 | "collapsed": true 1206 | }, 1207 | "outputs": [], 1208 | "source": [ 1209 | "# Make whichever components you want to use. You can create your own if you have ideas for improvement.\n", 1210 | "# Also, choose an optimizer.\n", 1211 | "# name your TransitionParser bakeoff_parser to output your predictions below\n", 1212 | "# bakeoff_parser = TransitionParser(...)" 1213 | ] 1214 | }, 1215 | { 1216 | "cell_type": "code", 1217 | "execution_count": null, 1218 | "metadata": { 1219 | "collapsed": true 1220 | }, 1221 | "outputs": [], 1222 | "source": [ 1223 | "# train for bakeoff\n", 1224 | "for epoch in xrange(NUM_EPOCHS):\n", 1225 | " print \"Epoch {}\".format(epoch+1)\n", 1226 | " \n", 1227 | " #parser.to_cuda() # uncomment to train on your GPU\n", 1228 | " parser.train() # turn on dropout layers if they are there\n", 1229 | " \n", 1230 | " # train on full training data\n", 1231 | " parsing.train(dataset.training_data, parser, optimizer, verbose=True)\n", 1232 | " \n", 1233 | " \n", 1234 | " print \"Dev Evaluation\"\n", 1235 | " #parser.to_cpu() #TODO fix evaluation so you dont have to ship everything back to the CPU\n", 1236 | " parser.eval() # turn them off for evaluation\n", 1237 | " parsing.evaluate(dataset.dev_data, parser, verbose=True)\n", 1238 | " print \"F-Score: {}\".format(evaluation.compute_metric(parser, dataset.dev_data, evaluation.fscore))\n", 1239 | " print \"Attachment Score: {}\".format(evaluation.compute_attachment(parser, dataset.dev_data))\n", 1240 | " print \"\\n\"" 1241 | ] 1242 | }, 1243 | { 1244 | "cell_type": "code", 1245 | "execution_count": null, 1246 | "metadata": { 1247 | "collapsed": true 1248 | }, 1249 | "outputs": [], 1250 | "source": [ 1251 | "evaluation.output_preds(\"bakeoff-test.preds\", parser, dataset.test_data)" 1252 | ] 1253 | }, 1254 | { 1255 | "cell_type": "code", 1256 | "execution_count": 35, 1257 | "metadata": { 1258 | "collapsed": true 1259 | }, 1260 | "outputs": [], 1261 | "source": [ 1262 | "evaluation.kaggle_output(\"KAGGLE-bakeoff-preds.csv\", parser, dataset.test_data)" 1263 | ] 1264 | } 1265 | ], 1266 | "metadata": { 1267 | "kernelspec": { 1268 | "display_name": "Python 2", 1269 | "language": "python2", 1270 | "name": "python2" 1271 | }, 1272 | "language_info": { 1273 | "codemirror_mode": { 1274 | "name": "ipython", 1275 | "version": 2 1276 | }, 1277 | "file_extension": ".py", 1278 | "mimetype": "text/x-python", 1279 | "name": "python", 1280 | "nbconvert_exporter": "python", 1281 | "pygments_lexer": "ipython2", 1282 | "version": "2.7.13" 1283 | } 1284 | }, 1285 | "nbformat": 4, 1286 | "nbformat_minor": 2 1287 | } 1288 | -------------------------------------------------------------------------------- /tests/test_parser.py: -------------------------------------------------------------------------------- 1 | from nose.tools import with_setup, eq_, assert_almost_equals, ok_ 2 | from gtnlplib.parsing import ParserState, TransitionParser, DepGraphEdge, train 3 | from gtnlplib.utils import DummyCombiner, DummyActionChooser, DummyWordEmbeddingLookup, DummyFeatureExtractor, initialize_with_pretrained 4 | from gtnlplib.data_tools import Dataset 5 | from gtnlplib.constants import TRAIN_FILE, DEV_FILE, TEST_FILE, END_OF_INPUT_TOK, NULL_STACK_TOK, D3_2_DEV_FILENAME, DEV_GOLD, D4_4_DEV_FILENAME 6 | from gtnlplib.evaluation import compute_metric, fscore, dependency_graph_from_oracle 7 | from gtnlplib.feat_extractors import SimpleFeatureExtractor 8 | from gtnlplib.neural_net import ActionChooserNetwork, MLPCombinerNetwork, VanillaWordEmbeddingLookup, BiLSTMWordEmbeddingLookup, LSTMCombinerNetwork 9 | 10 | import torch 11 | import torch.autograd as ag 12 | import torch.optim as optim 13 | 14 | EMBEDDING_DIM = 64 15 | TEST_EMBEDDING_DIM = 4 16 | NUM_FEATURES = 3 17 | 18 | def make_dummy_parser_state(sentence): 19 | dummy_embeds = [ w + "-EMBEDDING" for w in sentence ] + [END_OF_INPUT_TOK + "-EMBEDDING"] 20 | return ParserState(sentence + [END_OF_INPUT_TOK], dummy_embeds, DummyCombiner()) 21 | 22 | def make_list(var_list): 23 | return map(lambda x: x.view(-1).data.tolist(), var_list) 24 | 25 | def check_tensor_correctness(pairs): 26 | for pair in pairs: 27 | for pred, true in zip(pair[0], pair[1]): 28 | assert_almost_equals(pred, true, places=4) 29 | 30 | 31 | def setup(): 32 | global test_sent, gold, word_to_ix, vocab 33 | test_sent = "The man saw the dog with the telescope".split() + [END_OF_INPUT_TOK] 34 | gold = [ "SHIFT", "SHIFT", "REDUCE_L", "SHIFT", "REDUCE_L", "SHIFT", 35 | "SHIFT", "REDUCE_L", "REDUCE_R", "SHIFT", "SHIFT", "SHIFT", 36 | "REDUCE_L", "REDUCE_L", "REDUCE_R" ] 37 | vocab = set(test_sent) 38 | vocab.add(NULL_STACK_TOK) 39 | 40 | word_to_ix = { word: i for i, word in enumerate(vocab) } 41 | 42 | 43 | # ===-------------------------------------------------------------------------------------------=== 44 | # Section 1 tests 45 | # ===-------------------------------------------------------------------------------------------=== 46 | 47 | def test_stack_reduction_d1_1(): 48 | """ 1 point(s) """ 49 | 50 | global test_sent 51 | state = make_dummy_parser_state(test_sent) 52 | state.shift() 53 | state.shift() 54 | left_reduc = state.reduce_left() 55 | 56 | # head word should be "man" 57 | eq_(left_reduc[0][0], "man") 58 | eq_(left_reduc[0][1], 1) 59 | # dependent should be "The" 60 | eq_(left_reduc[1][0], "The") 61 | eq_(left_reduc[1][1], 0) 62 | 63 | # check the top of the stack to make sure combination was right 64 | eq_(state.stack[-1].headword, "man") 65 | eq_(state.stack[-1].headword_pos, 1) 66 | eq_(state.stack[-1].embedding, "man-EMBEDDING") 67 | 68 | state.shift() 69 | state.shift() 70 | right_reduc = state.reduce_right() 71 | 72 | # Head word should be "saw" 73 | eq_(right_reduc[0][0], "saw") 74 | eq_(right_reduc[0][1], 2) 75 | # dependent should be "the" 76 | eq_(right_reduc[1][0], "the") 77 | eq_(right_reduc[1][1], 3) 78 | 79 | eq_(state.stack[-1].headword, "saw") 80 | eq_(state.stack[-1].headword_pos, 2) 81 | eq_(state.stack[-1].embedding, "saw-EMBEDDING") 82 | 83 | 84 | def test_stack_terminating_cond_d1_2(): 85 | """ 0.5 point(s) """ 86 | global test_sent 87 | state = make_dummy_parser_state(test_sent[:-1]) 88 | 89 | assert not state.done_parsing() 90 | 91 | state.shift() 92 | state.shift() 93 | assert not state.done_parsing() 94 | 95 | state.shift() 96 | state.shift() 97 | state.reduce_left() 98 | state.reduce_left() 99 | state.reduce_left() 100 | 101 | assert not state.done_parsing() 102 | 103 | state.shift() 104 | state.shift() 105 | state.shift() 106 | state.shift() 107 | state.reduce_left() 108 | state.reduce_left() 109 | state.reduce_left() 110 | state.reduce_left() 111 | 112 | assert state.done_parsing() 113 | 114 | # //////////////////////////////////////////////////////////////////////////////////////////// 115 | 116 | # ===-------------------------------------------------------------------------------------------=== 117 | # Section 2 tests 118 | # ===-------------------------------------------------------------------------------------------=== 119 | 120 | def test_word_embed_lookup_d2_1(): 121 | """ 1 point(s) """ 122 | 123 | global test_sent, gold, word_to_ix, vocab 124 | torch.manual_seed(1) 125 | 126 | embedder = VanillaWordEmbeddingLookup(word_to_ix, TEST_EMBEDDING_DIM) 127 | embeds = embedder(test_sent) 128 | assert len(embeds) == len(test_sent) 129 | assert isinstance(embeds, list) 130 | assert isinstance(embeds[0], ag.Variable) 131 | assert embeds[0].size() == (1, TEST_EMBEDDING_DIM) 132 | 133 | embeds_list = make_list(embeds) 134 | 135 | true = ([-1.8661, 1.4146, -1.8781, -0.4674], 136 | [-0.9596, 0.5489, -0.9901, -0.3826], 137 | [0.5237, 0.0004, -1.2039, 3.5283], 138 | [0.3056, 1.0386, 0.5206, -0.5006], 139 | [0.4434, 0.5848, 0.8407, 0.5510], 140 | [-0.7576, 0.4215, -0.4827, -1.1198], 141 | [0.3056, 1.0386, 0.5206, -0.5006], 142 | [-2.9718, 1.7070, -0.4305, -2.2820], 143 | [0.3863, 0.9124, -0.8410, 1.2282] ) 144 | pairs = zip(embeds_list, true) 145 | check_tensor_correctness(pairs) 146 | 147 | 148 | def test_feature_extraction_d2_2(): 149 | """ 0.5 point(s) """ 150 | 151 | global test_sent, gold, word_to_ix, vocab 152 | torch.manual_seed(1) 153 | 154 | feat_extractor = SimpleFeatureExtractor() 155 | embedder = VanillaWordEmbeddingLookup(word_to_ix, TEST_EMBEDDING_DIM) 156 | combiner = DummyCombiner() 157 | embeds = embedder(test_sent) 158 | state = ParserState(test_sent, embeds, combiner) 159 | 160 | state.shift() 161 | state.shift() 162 | 163 | feats = feat_extractor.get_features(state) 164 | feats_list = make_list(feats) 165 | true = ([ -1.8661, 1.4146, -1.8781, -0.4674 ], [ -0.9596, 0.5489, -0.9901, -0.3826 ], [ 0.5237, 0.0004, -1.2039, 3.5283 ]) 166 | pairs = zip(feats_list, true) 167 | check_tensor_correctness(pairs) 168 | 169 | 170 | def test_action_chooser_d2_3(): 171 | """ 1 point(s) """ 172 | torch.manual_seed(1) 173 | act_chooser = ActionChooserNetwork(NUM_FEATURES * EMBEDDING_DIM) 174 | dummy_feats = [ ag.Variable(torch.randn(1, EMBEDDING_DIM)) for _ in xrange(NUM_FEATURES) ] 175 | out = act_chooser(dummy_feats) 176 | out_list = out.view(-1).data.tolist() 177 | true_out = [ -0.9352, -1.2393, -1.1460 ] 178 | check_tensor_correctness([(out_list, true_out)]) 179 | 180 | 181 | def test_combiner_d2_4(): 182 | """ 1 point(s) """ 183 | 184 | torch.manual_seed(1) 185 | combiner = MLPCombinerNetwork(6) 186 | head_feat = ag.Variable(torch.randn(1, 6)) 187 | modifier_feat = ag.Variable(torch.randn(1, 6)) 188 | combined = combiner(head_feat, modifier_feat) 189 | combined_list = combined.view(-1).data.tolist() 190 | true_out = [ -0.4897, 0.4484, -0.0591, 0.1778, 0.4223, -0.0940 ] 191 | check_tensor_correctness([(combined_list, true_out)]) 192 | 193 | 194 | # ===-------------------------------------------------------------------------------------------=== 195 | # Section 3 tests 196 | # ===-------------------------------------------------------------------------------------------=== 197 | 198 | def test_parse_logic_d3_1(): 199 | """ 0.5 point(s) """ 200 | 201 | global test_sent, gold, word_to_ix, vocab 202 | torch.manual_seed(1) 203 | 204 | feat_extract = SimpleFeatureExtractor() 205 | word_embed = VanillaWordEmbeddingLookup(word_to_ix, TEST_EMBEDDING_DIM) 206 | act_chooser = ActionChooserNetwork(TEST_EMBEDDING_DIM * NUM_FEATURES) 207 | combiner = MLPCombinerNetwork(TEST_EMBEDDING_DIM) 208 | 209 | parser = TransitionParser(feat_extract, word_embed, act_chooser, combiner) 210 | output, dep_graph, actions_done = parser(test_sent[:-1], gold) 211 | 212 | assert len(output) == 15 # Made the right number of decisions 213 | 214 | # check one of the outputs 215 | checked_out = output[10].view(-1).data.tolist() 216 | true_out = [ -1.4737, -1.0875, -0.8350 ] 217 | check_tensor_correctness([(true_out, checked_out)]) 218 | 219 | true_dep_graph = dependency_graph_from_oracle(test_sent, gold) 220 | assert true_dep_graph == dep_graph 221 | assert actions_done == [ 0, 0, 1, 0, 1, 0, 0, 1, 2, 0, 0, 0, 1, 1, 2 ] 222 | 223 | def test_predict_after_train_d3_1(): 224 | """ 1 point(s) """ 225 | 226 | global test_sent, gold, word_to_ix, vocab 227 | torch.manual_seed(1) 228 | feat_extract = SimpleFeatureExtractor() 229 | word_embed = VanillaWordEmbeddingLookup(word_to_ix, TEST_EMBEDDING_DIM) 230 | act_chooser = ActionChooserNetwork(TEST_EMBEDDING_DIM * NUM_FEATURES) 231 | combiner = MLPCombinerNetwork(TEST_EMBEDDING_DIM) 232 | 233 | parser = TransitionParser(feat_extract, word_embed, act_chooser, combiner) 234 | 235 | # Train 236 | for i in xrange(75): 237 | train([ (test_sent[:-1], gold) ], parser, optim.SGD(parser.parameters(), lr=0.01), verbose=False) 238 | 239 | # predict 240 | pred = parser.predict(test_sent[:-1]) 241 | gold_graph = dependency_graph_from_oracle(test_sent[:-1], gold) 242 | assert pred == gold_graph 243 | 244 | 245 | def test_dev_d3_2(): 246 | """ 0.5 point(s) / 0.25 point(s) (section dependent) """ 247 | preds = open(D3_2_DEV_FILENAME) 248 | gold = open(DEV_GOLD) 249 | 250 | correct = 0 251 | total = 0 252 | for p, g in zip(preds, gold): 253 | if p.strip() == "": 254 | assert g.strip() == "", "Mismatched blank lines" 255 | continue 256 | p_data = p.split("\t") 257 | g_data = g.split("\t") 258 | if p_data[3] == g_data[3]: 259 | correct += 1 260 | total += 1 261 | acc = float(correct) / total 262 | exp = 0.48 263 | assert acc > exp, "ERROR: Expected {} Got {}".format(exp, acc) 264 | 265 | 266 | # ===-------------------------------------------------------------------------------------------=== 267 | # Section 4 tests 268 | # ===-------------------------------------------------------------------------------------------=== 269 | 270 | def test_bilstm_word_embeds_d4_1(): 271 | """ 1 point(s) / 0.5 point(s) (section dependent) """ 272 | 273 | global test_sent, word_to_ix, vocab 274 | torch.manual_seed(1) 275 | 276 | embedder = BiLSTMWordEmbeddingLookup(word_to_ix, TEST_EMBEDDING_DIM, TEST_EMBEDDING_DIM, 1, 0.0) 277 | embeds = embedder(test_sent) 278 | assert len(embeds) == len(test_sent) 279 | assert isinstance(embeds, list) 280 | assert isinstance(embeds[0], ag.Variable) 281 | assert embeds[0].size() == (1, TEST_EMBEDDING_DIM) 282 | 283 | embeds_list = make_list(embeds) 284 | true = ( 285 | [ .4916, -.0168, .1719, .6615 ], 286 | [ .3756, -.0610, .1851, .2604 ], 287 | [ -.2655, -.1289, .1009, -.0016 ], 288 | [ -.1070, -.3971, .2414, -.2588 ], 289 | [ -.1717, -.4475, .2739, -.0465 ], 290 | [ 0.0684, -0.2586, 0.2123, -0.1832 ], 291 | [ -0.0775, -0.4308, 0.1844, -0.1146 ], 292 | [ 0.4366, -0.0507, 0.1018, 0.4015 ], 293 | [ -0.1265, -0.2192, 0.0481, 0.1551 ]) 294 | 295 | pairs = zip(embeds_list, true) 296 | check_tensor_correctness(pairs) 297 | 298 | 299 | def test_pretrained_embeddings_d4_2(): 300 | """ 0.5 point(s) """ 301 | 302 | torch.manual_seed(1) 303 | word_to_ix = { "interest": 0, "rate": 1, "swap": 2 } 304 | pretrained = { "interest": [ -1.4, 2.6, 3.5 ], "swap": [ 1.6, 5.7, 3.2 ] } 305 | embedder = VanillaWordEmbeddingLookup(word_to_ix, 3) 306 | initialize_with_pretrained(pretrained, embedder) 307 | 308 | embeddings = embedder.word_embeddings.weight.data 309 | 310 | pairs = [] 311 | true_rate_embed = [ -2.2820, 0.5237, 0.0004 ] 312 | 313 | pairs.append((embeddings[word_to_ix["interest"]].tolist(), pretrained["interest"])) 314 | pairs.append((embeddings[word_to_ix["rate"]].tolist(), true_rate_embed)) 315 | pairs.append((embeddings[word_to_ix["swap"]].tolist(), pretrained["swap"])) 316 | check_tensor_correctness(pairs) 317 | 318 | 319 | def test_lstm_combiner_d4_3(): 320 | """ 1 point(s) """ 321 | 322 | torch.manual_seed(1) 323 | combiner = LSTMCombinerNetwork(TEST_EMBEDDING_DIM, 1, 0.0) 324 | head_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM)) 325 | modifier_feat = ag.Variable(torch.randn(1, TEST_EMBEDDING_DIM)) 326 | 327 | # Do the combination a few times to make sure they implemented the sequential 328 | # part right 329 | combined = combiner(head_feat, modifier_feat) 330 | combined = combiner(head_feat, modifier_feat) 331 | combined = combiner(head_feat, modifier_feat) 332 | 333 | combined_list = combined.view(-1).data.tolist() 334 | 335 | true_out = [ 0.0873, -0.1837, 0.1975, -0.1166 ] 336 | check_tensor_correctness([(combined_list, true_out)]) 337 | 338 | 339 | def test_dev_preds_d4_4(): 340 | """ 0.5 point(s) / 0.25 point(s) (section dependent) """ 341 | 342 | preds = open(D4_4_DEV_FILENAME) 343 | gold = open(DEV_GOLD) 344 | 345 | correct = 0 346 | total = 0 347 | for p, g in zip(preds, gold): 348 | if p.strip() == "": 349 | assert g.strip() == "", "Mismatched blank lines" 350 | continue 351 | p_data = p.split("\t") 352 | g_data = g.split("\t") 353 | if p_data[3] == g_data[3]: 354 | correct += 1 355 | total += 1 356 | acc = float(correct) / total 357 | exp = 0.69 358 | assert acc > exp, "ERROR: Expected {} Got {}".format(exp, acc) 359 | -------------------------------------------------------------------------------- /text-answers.md: -------------------------------------------------------------------------------- 1 | # 6. 7650 only: comparing hyperparameters (1 point) 2 | 3 | Do a systematic comparison of one hyperparameter: could be input embedding size, stack embedding size, learning rate, dropout, or something you added for the bakeoff. Try at least five different values, using either your system from 4.4 or from the bakeoff. Explain what you tried and what you found. 4 | 5 | [your answer] 6 | 7 | 8 | --------------------------------------------------------------------------------