<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>우니Blog</title>
    <link>https://ynebula.tistory.com/</link>
    <description>인공지능 및 빅데이터를 개인 학습을 위한 공간입니다.
잘못된 점 있으면 댓글로 지적 부탁드립니다.
꾸준히 자료를 올릴테니 많은 방문 바랍니다.</description>
    <language>ko</language>
    <pubDate>Tue, 23 Jun 2026 16:11:42 +0900</pubDate>
    <generator>TISTORY</generator>
    <ttl>100</ttl>
    <managingEditor>[성운]</managingEditor>
    <image>
      <title>우니Blog</title>
      <url>https://tistory1.daumcdn.net/tistory/3022057/attach/7537a95ba17f44019dd175f7a72ba125</url>
      <link>https://ynebula.tistory.com</link>
    </image>
    <item>
      <title>촬리멍거 버크셔해서웨이 매수 및 성장</title>
      <link>https://ynebula.tistory.com/68</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=6O20S3-o6-Q&quot;&gt;https://www.youtube.com/watch?v=6O20S3-o6-Q&lt;/a&gt;&lt;/p&gt;
&lt;figure data-ke-type=&quot;video&quot; data-ke-style=&quot;alignCenter&quot; data-video-host=&quot;youtube&quot; data-video-url=&quot;https://www.youtube.com/watch?v=6O20S3-o6-Q&quot; data-video-thumbnail=&quot;https://scrap.kakaocdn.net/dn/M856z/hyWZaSQC5L/u8K1fvbkMo4IS6x4h2gDNK/img.jpg?width=1280&amp;amp;height=720&amp;amp;face=684_134_930_402&quot; data-video-width=&quot;860&quot; data-video-height=&quot;484&quot; data-video-origin-width=&quot;860&quot; data-video-origin-height=&quot;484&quot; data-ke-mobilestyle=&quot;widthContent&quot; data-video-title=&quot;촬리멍거 버크셔해서웨이 매수 및 성장&quot; data-original-url=&quot;&quot;&gt;&lt;iframe src=&quot;https://www.youtube.com/embed/6O20S3-o6-Q&quot; width=&quot;860&quot; height=&quot;484&quot; frameborder=&quot;&quot; allowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;
&lt;figcaption style=&quot;display: none;&quot;&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&amp;nbsp;&lt;/p&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;[Metting&amp;nbsp;Warren&amp;nbsp;Buffet&amp;nbsp;-&amp;nbsp;워렌&amp;nbsp;버핏&amp;nbsp;만남]&lt;br /&gt;When&amp;nbsp;I&amp;nbsp;went&amp;nbsp;back&amp;nbsp;in&amp;nbsp;1959&amp;nbsp;to&amp;nbsp;wind&amp;nbsp;up&amp;nbsp;my&amp;nbsp;father's&amp;nbsp;law&amp;nbsp;practice,&amp;nbsp;mutual&amp;nbsp;friends&amp;nbsp;introduced&amp;nbsp;us.&lt;br /&gt;1959년에&amp;nbsp;아버지의&amp;nbsp;법률&amp;nbsp;사무소를&amp;nbsp;정리하러&amp;nbsp;돌아갔을&amp;nbsp;때,&amp;nbsp;상호&amp;nbsp;친구들이&amp;nbsp;우리를&amp;nbsp;소개해&amp;nbsp;주었습니다.&lt;br /&gt;&lt;br /&gt;My&amp;nbsp;father&amp;nbsp;was&amp;nbsp;a&amp;nbsp;sole&amp;nbsp;practitioner,&amp;nbsp;and&amp;nbsp;somebody&amp;nbsp;had&amp;nbsp;to&amp;nbsp;go&amp;nbsp;sit&amp;nbsp;in&amp;nbsp;his&amp;nbsp;chair&amp;nbsp;and&amp;nbsp;wind&amp;nbsp;up&amp;nbsp;his&amp;nbsp;practice.&lt;br /&gt;내&amp;nbsp;아버지는&amp;nbsp;개인&amp;nbsp;개업의였고,&amp;nbsp;누군가가&amp;nbsp;그의&amp;nbsp;자리에&amp;nbsp;앉아&amp;nbsp;그의&amp;nbsp;업무를&amp;nbsp;정리해야&amp;nbsp;했다.&lt;br /&gt;&lt;br /&gt;It&amp;nbsp;was&amp;nbsp;during&amp;nbsp;that&amp;nbsp;period&amp;nbsp;that&amp;nbsp;I&amp;nbsp;met&amp;nbsp;Warren.&lt;br /&gt;그&amp;nbsp;기간&amp;nbsp;동안&amp;nbsp;저는&amp;nbsp;워렌을&amp;nbsp;만났습니다.&lt;br /&gt;&lt;br /&gt;When&amp;nbsp;I&amp;nbsp;first&amp;nbsp;met&amp;nbsp;Warren,&amp;nbsp;I&amp;nbsp;recognized&amp;nbsp;immediately&amp;nbsp;that&amp;nbsp;he&amp;nbsp;was&amp;nbsp;a&amp;nbsp;very&amp;nbsp;intelligent&amp;nbsp;person.&lt;br /&gt;처음&amp;nbsp;워렌을&amp;nbsp;만났을&amp;nbsp;때,&amp;nbsp;저는&amp;nbsp;즉시&amp;nbsp;그가&amp;nbsp;매우&amp;nbsp;지적인&amp;nbsp;사람이라는&amp;nbsp;것을&amp;nbsp;알아차렸습니다.&lt;br /&gt;&lt;br /&gt;Of&amp;nbsp;course,&amp;nbsp;he&amp;nbsp;was&amp;nbsp;interested&amp;nbsp;in&amp;nbsp;the&amp;nbsp;subject&amp;nbsp;that&amp;nbsp;I&amp;nbsp;was&amp;nbsp;also&amp;nbsp;interested&amp;nbsp;in,&amp;nbsp;which&amp;nbsp;was&amp;nbsp;the&amp;nbsp;process&amp;nbsp;of&amp;nbsp;being&amp;nbsp;a&amp;nbsp;successful&amp;nbsp;investor.&lt;br /&gt;물론,&amp;nbsp;그는&amp;nbsp;제가&amp;nbsp;관심을&amp;nbsp;가지고&amp;nbsp;있던&amp;nbsp;주제,&amp;nbsp;즉&amp;nbsp;성공적인&amp;nbsp;투자자가&amp;nbsp;되는&amp;nbsp;과정에&amp;nbsp;관심이&amp;nbsp;있었습니다.&lt;br /&gt;&lt;br /&gt;We&amp;nbsp;have&amp;nbsp;a&amp;nbsp;similar&amp;nbsp;sense&amp;nbsp;of&amp;nbsp;humor,&amp;nbsp;and&amp;nbsp;we&amp;nbsp;had&amp;nbsp;a&amp;nbsp;high&amp;nbsp;old&amp;nbsp;time&amp;nbsp;probably&amp;nbsp;making&amp;nbsp;ourselves&amp;nbsp;obnoxious&amp;nbsp;to&amp;nbsp;the&amp;nbsp;other&amp;nbsp;people&amp;nbsp;in&amp;nbsp;the&amp;nbsp;room.&lt;br /&gt;우리는&amp;nbsp;비슷한&amp;nbsp;유머&amp;nbsp;감각을&amp;nbsp;가지고&amp;nbsp;있었고,&amp;nbsp;아마도&amp;nbsp;방&amp;nbsp;안의&amp;nbsp;다른&amp;nbsp;사람들에게&amp;nbsp;불쾌감을&amp;nbsp;줄&amp;nbsp;정도로&amp;nbsp;즐거운&amp;nbsp;시간을&amp;nbsp;보냈습니다.&lt;br /&gt;&lt;br /&gt;We&amp;nbsp;both&amp;nbsp;came&amp;nbsp;from&amp;nbsp;Omaha.&amp;nbsp;We&amp;nbsp;both&amp;nbsp;worked&amp;nbsp;in&amp;nbsp;his&amp;nbsp;grandfather's&amp;nbsp;grocery&amp;nbsp;store,&amp;nbsp;so&amp;nbsp;we&amp;nbsp;had&amp;nbsp;a&amp;nbsp;lot&amp;nbsp;of&amp;nbsp;common&amp;nbsp;experience.&lt;br /&gt;우리는&amp;nbsp;둘&amp;nbsp;다&amp;nbsp;오마하&amp;nbsp;출신이었다.&amp;nbsp;우리는&amp;nbsp;둘&amp;nbsp;다&amp;nbsp;그의&amp;nbsp;할아버지의&amp;nbsp;식료품점에서&amp;nbsp;일했었기&amp;nbsp;때문에,&amp;nbsp;우리는&amp;nbsp;많은&amp;nbsp;공통된&amp;nbsp;경험을&amp;nbsp;가지고&amp;nbsp;있었습니다.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[Buying&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;-&amp;nbsp;버크셔&amp;nbsp;해서웨이&amp;nbsp;매수]&lt;br /&gt;If&amp;nbsp;he'd&amp;nbsp;bought&amp;nbsp;it&amp;nbsp;in&amp;nbsp;his&amp;nbsp;partnership&amp;nbsp;with&amp;nbsp;you&amp;nbsp;and&amp;nbsp;more&amp;nbsp;of,&amp;nbsp;he&amp;nbsp;would&amp;nbsp;have&amp;nbsp;made&amp;nbsp;a&amp;nbsp;lot&amp;nbsp;more&amp;nbsp;money,&amp;nbsp;&lt;br /&gt;만약&amp;nbsp;그가&amp;nbsp;당신과의&amp;nbsp;파트너십에서&amp;nbsp;더&amp;nbsp;많이&amp;nbsp;매입했다면,&amp;nbsp;그는&amp;nbsp;훨씬&amp;nbsp;더&amp;nbsp;많은&amp;nbsp;돈을&amp;nbsp;벌었을&amp;nbsp;것입니다.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;and&amp;nbsp;as&amp;nbsp;it&amp;nbsp;worked&amp;nbsp;out,&amp;nbsp;he&amp;nbsp;made&amp;nbsp;billions&amp;nbsp;and&amp;nbsp;billions&amp;nbsp;of&amp;nbsp;dollars&amp;nbsp;for&amp;nbsp;a&amp;nbsp;bunch&amp;nbsp;of&amp;nbsp;people&amp;nbsp;he&amp;nbsp;didn't&amp;nbsp;even&amp;nbsp;know,&amp;nbsp;&lt;br /&gt;그리고&amp;nbsp;결과적으로,&amp;nbsp;그는&amp;nbsp;자신이&amp;nbsp;알지도&amp;nbsp;못하는&amp;nbsp;많은&amp;nbsp;사람들을&amp;nbsp;위해&amp;nbsp;수십억&amp;nbsp;달러를&amp;nbsp;벌었습니다.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;but&amp;nbsp;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;he&amp;nbsp;can&amp;nbsp;call&amp;nbsp;it&amp;nbsp;a&amp;nbsp;mistake&amp;nbsp;in&amp;nbsp;that&amp;nbsp;sense,&lt;br /&gt;하지만&amp;nbsp;그런&amp;nbsp;의미에서&amp;nbsp;그것을&amp;nbsp;실수라고&amp;nbsp;부를&amp;nbsp;수는&amp;nbsp;없다고&amp;nbsp;생각합니다.&lt;br /&gt;&lt;br /&gt;but&amp;nbsp;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;he&amp;nbsp;regrets&amp;nbsp;it.&lt;br /&gt;그리고&amp;nbsp;그가&amp;nbsp;후회한다고&amp;nbsp;생각하지&amp;nbsp;않습니다.&lt;br /&gt;&lt;br /&gt;He's&amp;nbsp;given&amp;nbsp;him&amp;nbsp;a&amp;nbsp;public&amp;nbsp;platform&amp;nbsp;that's&amp;nbsp;enabled&amp;nbsp;him&amp;nbsp;to,&amp;nbsp;in&amp;nbsp;effect,&amp;nbsp;teach&amp;nbsp;what&amp;nbsp;he&amp;nbsp;wants&amp;nbsp;to&amp;nbsp;teach.&lt;br /&gt;그것은&amp;nbsp;그에게&amp;nbsp;공개적인&amp;nbsp;플랫폼을&amp;nbsp;제공했고,&amp;nbsp;이를&amp;nbsp;통해&amp;nbsp;그가&amp;nbsp;가르치고&amp;nbsp;싶은&amp;nbsp;것을&amp;nbsp;효과적으로&amp;nbsp;가르칠&amp;nbsp;수&amp;nbsp;있게&amp;nbsp;해주었습니다.&lt;br /&gt;&lt;br /&gt;No,&amp;nbsp;I&amp;nbsp;think&amp;nbsp;if&amp;nbsp;you&amp;nbsp;ask&amp;nbsp;him&amp;nbsp;to&amp;nbsp;live&amp;nbsp;his&amp;nbsp;life&amp;nbsp;over&amp;nbsp;and&amp;nbsp;say,&amp;nbsp;you&amp;nbsp;can&amp;nbsp;go&amp;nbsp;back&amp;nbsp;and&amp;nbsp;buy&amp;nbsp;national&amp;nbsp;indemnity&amp;nbsp;in&amp;nbsp;your&amp;nbsp;partnership&amp;nbsp;instead&amp;nbsp;of&amp;nbsp;in&amp;nbsp;Berkshire,&amp;nbsp;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;he'd&amp;nbsp;do&amp;nbsp;it.&lt;br /&gt;아니요,&amp;nbsp;만약&amp;nbsp;당신이&amp;nbsp;그에게&amp;nbsp;인생을&amp;nbsp;다시&amp;nbsp;살아보라고&amp;nbsp;하고,&amp;nbsp;버크셔가&amp;nbsp;아닌&amp;nbsp;당신의&amp;nbsp;파트너십에서&amp;nbsp;내셔널&amp;nbsp;인덤니티를&amp;nbsp;매입할&amp;nbsp;수&amp;nbsp;있다고&amp;nbsp;한다면,&amp;nbsp;그가&amp;nbsp;그렇게&amp;nbsp;하지&amp;nbsp;않을&amp;nbsp;것이라고&amp;nbsp;생각합니다.&lt;br /&gt;&lt;br /&gt;One&amp;nbsp;of&amp;nbsp;the&amp;nbsp;reasons&amp;nbsp;Warren's&amp;nbsp;successful&amp;nbsp;is&amp;nbsp;he's&amp;nbsp;brutal&amp;nbsp;in&amp;nbsp;appraising&amp;nbsp;his&amp;nbsp;own&amp;nbsp;past.&lt;br /&gt;워렌이&amp;nbsp;성공한&amp;nbsp;이유&amp;nbsp;중&amp;nbsp;하나는&amp;nbsp;그가&amp;nbsp;자신의&amp;nbsp;과거를&amp;nbsp;평가하는&amp;nbsp;데&amp;nbsp;있어&amp;nbsp;냉혹하기&amp;nbsp;때문입니다.&lt;br /&gt;&lt;br /&gt;He&amp;nbsp;wants&amp;nbsp;to&amp;nbsp;identify&amp;nbsp;misthinkings&amp;nbsp;and&amp;nbsp;avoid&amp;nbsp;them&amp;nbsp;in&amp;nbsp;the&amp;nbsp;future,&amp;nbsp;&lt;br /&gt;그는&amp;nbsp;잘못된&amp;nbsp;생각을&amp;nbsp;식별하고,&amp;nbsp;미래에&amp;nbsp;그것들을&amp;nbsp;피하고자&amp;nbsp;합니다.&lt;br /&gt;&lt;br /&gt;and&amp;nbsp;in&amp;nbsp;a&amp;nbsp;narrow&amp;nbsp;financial&amp;nbsp;sense&amp;nbsp;that&amp;nbsp;was&amp;nbsp;misthinking,&amp;nbsp;&lt;br /&gt;좁은&amp;nbsp;재정적&amp;nbsp;관점에서&amp;nbsp;보면&amp;nbsp;그것은&amp;nbsp;잘못된&amp;nbsp;생각이었지만,&amp;nbsp;&lt;br /&gt;&lt;br /&gt;but&amp;nbsp;I&amp;nbsp;would&amp;nbsp;say&amp;nbsp;in&amp;nbsp;a&amp;nbsp;big&amp;nbsp;sense&amp;nbsp;it&amp;nbsp;was&amp;nbsp;fortunate&amp;nbsp;misthinking&amp;nbsp;because&amp;nbsp;his&amp;nbsp;life&amp;nbsp;worked&amp;nbsp;out&amp;nbsp;better.&lt;br /&gt;큰&amp;nbsp;관점에서&amp;nbsp;보면&amp;nbsp;그의&amp;nbsp;인생이&amp;nbsp;더&amp;nbsp;나아졌기&amp;nbsp;때문에&amp;nbsp;운이&amp;nbsp;좋은&amp;nbsp;잘못된&amp;nbsp;생각이었다고&amp;nbsp;할&amp;nbsp;수&amp;nbsp;있습니다.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[Picking&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;-&amp;nbsp;버크셔&amp;nbsp;해서웨이를&amp;nbsp;선택한&amp;nbsp;것]&lt;br /&gt;What&amp;nbsp;happened&amp;nbsp;by&amp;nbsp;accident?&lt;br /&gt;우연히&amp;nbsp;무슨&amp;nbsp;일이&amp;nbsp;일어났나요?&lt;br /&gt;&lt;br /&gt;He&amp;nbsp;had&amp;nbsp;that&amp;nbsp;compter&amp;nbsp;talk&amp;nbsp;where&amp;nbsp;the&amp;nbsp;CEO&amp;nbsp;of&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;tried&amp;nbsp;to&amp;nbsp;cheat&amp;nbsp;him&amp;nbsp;out&amp;nbsp;of&amp;nbsp;an&amp;nbsp;eighth,&amp;nbsp;&lt;br /&gt;그는&amp;nbsp;버크셔&amp;nbsp;해서웨이의&amp;nbsp;CEO가&amp;nbsp;그를&amp;nbsp;8분의&amp;nbsp;1만큼&amp;nbsp;속이려&amp;nbsp;했던&amp;nbsp;그&amp;nbsp;유명한&amp;nbsp;대화를&amp;nbsp;했습니다.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;and&amp;nbsp;he&amp;nbsp;got&amp;nbsp;angry&amp;nbsp;and&amp;nbsp;said,&amp;nbsp;well,&amp;nbsp;hell&amp;nbsp;with&amp;nbsp;you,&amp;nbsp;I'll&amp;nbsp;just&amp;nbsp;buy&amp;nbsp;more,&amp;nbsp;and&amp;nbsp;that&amp;nbsp;was&amp;nbsp;a&amp;nbsp;pretty&amp;nbsp;silly&amp;nbsp;way&amp;nbsp;to&amp;nbsp;behave&amp;nbsp;as&amp;nbsp;Warren&amp;nbsp;has&amp;nbsp;recounted&amp;nbsp;in&amp;nbsp;retrospect.&lt;br /&gt;그는&amp;nbsp;화가&amp;nbsp;나서&amp;nbsp;말했습니다,&amp;nbsp;'좋아,&amp;nbsp;당신&amp;nbsp;맘대로&amp;nbsp;해.&amp;nbsp;나는&amp;nbsp;그냥&amp;nbsp;더&amp;nbsp;많이&amp;nbsp;살&amp;nbsp;거야.'.&amp;nbsp;그리고&amp;nbsp;워렌이&amp;nbsp;회고하듯이&amp;nbsp;이는&amp;nbsp;꽤&amp;nbsp;어리석은&amp;nbsp;행동이었습니다.&lt;br /&gt;&lt;br /&gt;But&amp;nbsp;it's&amp;nbsp;what&amp;nbsp;he&amp;nbsp;did,&amp;nbsp;and&amp;nbsp;the&amp;nbsp;rest&amp;nbsp;is&amp;nbsp;history.&lt;br /&gt;하지만&amp;nbsp;그가&amp;nbsp;한&amp;nbsp;일이&amp;nbsp;바로&amp;nbsp;그것이었고,&amp;nbsp;나머지는&amp;nbsp;역사가&amp;nbsp;되었습니다.&lt;br /&gt;&lt;br /&gt;It&amp;nbsp;happened&amp;nbsp;to&amp;nbsp;make&amp;nbsp;his&amp;nbsp;life&amp;nbsp;work&amp;nbsp;better,&amp;nbsp;not&amp;nbsp;worse,&amp;nbsp;but&amp;nbsp;it&amp;nbsp;was&amp;nbsp;an&amp;nbsp;accident&amp;nbsp;that&amp;nbsp;he&amp;nbsp;chose&amp;nbsp;Berkshire&amp;nbsp;Hathaway.&lt;br /&gt;그가&amp;nbsp;버크셔&amp;nbsp;해서웨이를&amp;nbsp;선택한&amp;nbsp;것은&amp;nbsp;우연이었지만,&amp;nbsp;결과적으로&amp;nbsp;그의&amp;nbsp;인생을&amp;nbsp;더&amp;nbsp;좋게&amp;nbsp;만들었습니다.&lt;br /&gt;&lt;br /&gt;If&amp;nbsp;the&amp;nbsp;chairman&amp;nbsp;hadn't&amp;nbsp;tried&amp;nbsp;to&amp;nbsp;cheat&amp;nbsp;him&amp;nbsp;out&amp;nbsp;of&amp;nbsp;an&amp;nbsp;eighth&amp;nbsp;on&amp;nbsp;an&amp;nbsp;$11&amp;nbsp;price,&amp;nbsp;there&amp;nbsp;wouldn't&amp;nbsp;have&amp;nbsp;been&amp;nbsp;any&amp;nbsp;Buffett&amp;nbsp;Dash,&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;history.&lt;br /&gt;만약&amp;nbsp;회장이&amp;nbsp;11달러&amp;nbsp;가격에서&amp;nbsp;8분의&amp;nbsp;1만큼&amp;nbsp;그를&amp;nbsp;속이려&amp;nbsp;하지&amp;nbsp;않았다면,&amp;nbsp;버핏의&amp;nbsp;돌진도,&amp;nbsp;버크셔&amp;nbsp;해서웨이의&amp;nbsp;역사도&amp;nbsp;없었을&amp;nbsp;것입니다.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[The&amp;nbsp;blueprint&amp;nbsp;of&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;-&amp;nbsp;버크셔&amp;nbsp;해서웨이의&amp;nbsp;청사진]&lt;br /&gt;Well,&amp;nbsp;he'd&amp;nbsp;made&amp;nbsp;so&amp;nbsp;much&amp;nbsp;money&amp;nbsp;for&amp;nbsp;so&amp;nbsp;long,&amp;nbsp;doing&amp;nbsp;what&amp;nbsp;he'd&amp;nbsp;been&amp;nbsp;taught&amp;nbsp;by&amp;nbsp;Ben&amp;nbsp;Graham,&amp;nbsp;which&amp;nbsp;is&amp;nbsp;to&amp;nbsp;buy&amp;nbsp;these&amp;nbsp;very&amp;nbsp;cheap&amp;nbsp;stocks,&amp;nbsp;&lt;br /&gt;그는&amp;nbsp;벤&amp;nbsp;그레이엄에게&amp;nbsp;배운&amp;nbsp;대로&amp;nbsp;매우&amp;nbsp;저렴한&amp;nbsp;주식을&amp;nbsp;매수하는&amp;nbsp;방식으로&amp;nbsp;오랫동안&amp;nbsp;많은&amp;nbsp;돈을&amp;nbsp;벌었습니다.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;and&amp;nbsp;if&amp;nbsp;they&amp;nbsp;were&amp;nbsp;cheap&amp;nbsp;enough,&amp;nbsp;he&amp;nbsp;didn't&amp;nbsp;care&amp;nbsp;it&amp;nbsp;was&amp;nbsp;a&amp;nbsp;lousy&amp;nbsp;company&amp;nbsp;and&amp;nbsp;a&amp;nbsp;lousy&amp;nbsp;management.&lt;br /&gt;주식이&amp;nbsp;충분히&amp;nbsp;저렴하다면&amp;nbsp;회사가&amp;nbsp;형편없고&amp;nbsp;경영진이&amp;nbsp;좋지&amp;nbsp;않아도&amp;nbsp;상관하지&amp;nbsp;않았습니다.&lt;br /&gt;&lt;br /&gt;He&amp;nbsp;knew&amp;nbsp;it&amp;nbsp;was&amp;nbsp;going&amp;nbsp;to&amp;nbsp;be&amp;nbsp;money&amp;nbsp;anyway&amp;nbsp;just&amp;nbsp;because&amp;nbsp;of&amp;nbsp;the&amp;nbsp;cheapness,&amp;nbsp;&lt;br /&gt;그는&amp;nbsp;단지&amp;nbsp;저렴함&amp;nbsp;때문에&amp;nbsp;어쨌든&amp;nbsp;돈이&amp;nbsp;될&amp;nbsp;것이라는&amp;nbsp;것을&amp;nbsp;알고&amp;nbsp;있었습니다.&lt;br /&gt;&lt;br /&gt;and&amp;nbsp;I&amp;nbsp;always&amp;nbsp;knew&amp;nbsp;that&amp;nbsp;would&amp;nbsp;be&amp;nbsp;self-limiting,&amp;nbsp;that&amp;nbsp;would&amp;nbsp;only&amp;nbsp;be&amp;nbsp;available&amp;nbsp;for&amp;nbsp;a&amp;nbsp;while&amp;nbsp;and&amp;nbsp;then&amp;nbsp;it&amp;nbsp;would&amp;nbsp;go&amp;nbsp;away,&amp;nbsp;&lt;br /&gt;저는&amp;nbsp;항상&amp;nbsp;이것이&amp;nbsp;자기&amp;nbsp;제한적이며,&amp;nbsp;잠시&amp;nbsp;동안만&amp;nbsp;가능하고&amp;nbsp;결국&amp;nbsp;사라질&amp;nbsp;것이라는&amp;nbsp;것을&amp;nbsp;알고&amp;nbsp;있었습니다.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;and&amp;nbsp;it&amp;nbsp;would&amp;nbsp;be&amp;nbsp;easier&amp;nbsp;to&amp;nbsp;make&amp;nbsp;money&amp;nbsp;by&amp;nbsp;getting&amp;nbsp;into&amp;nbsp;the&amp;nbsp;great&amp;nbsp;businesses&amp;nbsp;that&amp;nbsp;either&amp;nbsp;had&amp;nbsp;a&amp;nbsp;great&amp;nbsp;manager&amp;nbsp;or&amp;nbsp;were&amp;nbsp;businesses&amp;nbsp;where&amp;nbsp;a&amp;nbsp;fool&amp;nbsp;could&amp;nbsp;run&amp;nbsp;and&amp;nbsp;still&amp;nbsp;prosper.&lt;br /&gt;그리고&amp;nbsp;훌륭한&amp;nbsp;경영자가&amp;nbsp;있거나&amp;nbsp;바보라도&amp;nbsp;운영할&amp;nbsp;수&amp;nbsp;있는&amp;nbsp;훌륭한&amp;nbsp;사업에&amp;nbsp;투자하는&amp;nbsp;것이&amp;nbsp;더&amp;nbsp;쉽게&amp;nbsp;돈을&amp;nbsp;벌&amp;nbsp;수&amp;nbsp;있는&amp;nbsp;방법이&amp;nbsp;될&amp;nbsp;것이라고&amp;nbsp;생각했습니다.&lt;br /&gt;&lt;br /&gt;So&amp;nbsp;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;I&amp;nbsp;did&amp;nbsp;anything&amp;nbsp;but&amp;nbsp;maybe&amp;nbsp;cause&amp;nbsp;Warren&amp;nbsp;to&amp;nbsp;go&amp;nbsp;where&amp;nbsp;he&amp;nbsp;was&amp;nbsp;going&amp;nbsp;to&amp;nbsp;go&amp;nbsp;anyway&amp;nbsp;a&amp;nbsp;little&amp;nbsp;faster.&lt;br /&gt;그래서&amp;nbsp;제가&amp;nbsp;한&amp;nbsp;일은&amp;nbsp;아마도&amp;nbsp;워렌이&amp;nbsp;어차피&amp;nbsp;가려고&amp;nbsp;했던&amp;nbsp;곳으로&amp;nbsp;조금&amp;nbsp;더&amp;nbsp;빨리&amp;nbsp;가도록&amp;nbsp;만든&amp;nbsp;것&amp;nbsp;뿐이라고&amp;nbsp;생각합니다.&lt;br /&gt;&lt;br /&gt;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;I&amp;nbsp;changed&amp;nbsp;him.&amp;nbsp;I&amp;nbsp;think&amp;nbsp;he&amp;nbsp;would&amp;nbsp;have&amp;nbsp;been&amp;nbsp;there&amp;nbsp;anyway.&lt;br /&gt;저는&amp;nbsp;제가&amp;nbsp;그를&amp;nbsp;변화시켰다고&amp;nbsp;생각하지&amp;nbsp;않습니다.&amp;nbsp;저는&amp;nbsp;그가&amp;nbsp;어차피&amp;nbsp;그곳에&amp;nbsp;있었을&amp;nbsp;것이라고&amp;nbsp;생각합니다.&lt;br /&gt;&lt;br /&gt;There&amp;nbsp;was&amp;nbsp;more&amp;nbsp;potential&amp;nbsp;for&amp;nbsp;the&amp;nbsp;long&amp;nbsp;pull&amp;nbsp;than&amp;nbsp;getting&amp;nbsp;in&amp;nbsp;the&amp;nbsp;good&amp;nbsp;companies.&lt;br /&gt;장기적으로&amp;nbsp;볼&amp;nbsp;때&amp;nbsp;좋은&amp;nbsp;회사에&amp;nbsp;투자하는&amp;nbsp;것이&amp;nbsp;더&amp;nbsp;큰&amp;nbsp;잠재력이&amp;nbsp;있었습니다.&lt;br /&gt;&lt;br /&gt;We&amp;nbsp;both&amp;nbsp;wanted&amp;nbsp;them&amp;nbsp;cheap,&amp;nbsp;but&amp;nbsp;cheap&amp;nbsp;good&amp;nbsp;companies&amp;nbsp;was&amp;nbsp;the&amp;nbsp;field&amp;nbsp;that&amp;nbsp;we&amp;nbsp;shifted&amp;nbsp;to,&amp;nbsp;&lt;br /&gt;우리&amp;nbsp;둘&amp;nbsp;다&amp;nbsp;저렴한&amp;nbsp;것을&amp;nbsp;원했지만,&amp;nbsp;우리가&amp;nbsp;옮겨간&amp;nbsp;분야는&amp;nbsp;저렴한&amp;nbsp;좋은&amp;nbsp;회사들이었습니다.&amp;nbsp;&lt;br /&gt;&lt;br /&gt;and&amp;nbsp;of&amp;nbsp;course&amp;nbsp;that&amp;nbsp;was&amp;nbsp;really&amp;nbsp;important&amp;nbsp;when&amp;nbsp;we&amp;nbsp;started&amp;nbsp;to&amp;nbsp;buy&amp;nbsp;whole&amp;nbsp;companies.&lt;br /&gt;그리고&amp;nbsp;물론&amp;nbsp;이는&amp;nbsp;우리가&amp;nbsp;전체&amp;nbsp;회사를&amp;nbsp;매입하기&amp;nbsp;시작했을&amp;nbsp;때&amp;nbsp;정말&amp;nbsp;중요했습니다.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[On&amp;nbsp;growing&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;-&amp;nbsp;버크셔&amp;nbsp;해서웨이의&amp;nbsp;성장에&amp;nbsp;대해]&lt;br /&gt;I'll&amp;nbsp;tell&amp;nbsp;you&amp;nbsp;how&amp;nbsp;you&amp;nbsp;do&amp;nbsp;it.&lt;br /&gt;제가&amp;nbsp;어떻게&amp;nbsp;하는지&amp;nbsp;알려드리겠습니다.&lt;br /&gt;&lt;br /&gt;Have&amp;nbsp;you&amp;nbsp;ever&amp;nbsp;seen&amp;nbsp;a&amp;nbsp;juggler&amp;nbsp;juggle&amp;nbsp;25&amp;nbsp;milk&amp;nbsp;bottles?&lt;br /&gt;곡예사가&amp;nbsp;25개의&amp;nbsp;우유병을&amp;nbsp;저글링하는&amp;nbsp;것을&amp;nbsp;본&amp;nbsp;적이&amp;nbsp;있나요?&lt;br /&gt;&lt;br /&gt;How&amp;nbsp;did&amp;nbsp;he&amp;nbsp;ever&amp;nbsp;get&amp;nbsp;to&amp;nbsp;do&amp;nbsp;that?&lt;br /&gt;그가&amp;nbsp;어떻게&amp;nbsp;그것을&amp;nbsp;할&amp;nbsp;수&amp;nbsp;있었을까요?&lt;br /&gt;&lt;br /&gt;The&amp;nbsp;answer&amp;nbsp;he&amp;nbsp;started&amp;nbsp;with&amp;nbsp;one&amp;nbsp;bottle&amp;nbsp;and&amp;nbsp;two&amp;nbsp;and&amp;nbsp;three&amp;nbsp;and&amp;nbsp;just&amp;nbsp;kept&amp;nbsp;doing&amp;nbsp;it.&amp;nbsp;And&amp;nbsp;pretty&amp;nbsp;soon&amp;nbsp;he&amp;nbsp;was&amp;nbsp;at&amp;nbsp;25.&lt;br /&gt;답은&amp;nbsp;그가&amp;nbsp;한&amp;nbsp;병으로&amp;nbsp;시작해서&amp;nbsp;두&amp;nbsp;개,&amp;nbsp;세&amp;nbsp;개로&amp;nbsp;늘려가며&amp;nbsp;계속&amp;nbsp;연습했다는&amp;nbsp;것입니다.&amp;nbsp;그리고&amp;nbsp;곧&amp;nbsp;그는&amp;nbsp;25개에&amp;nbsp;이르렀습니다.&lt;br /&gt;&lt;br /&gt;And&amp;nbsp;that's&amp;nbsp;the&amp;nbsp;way&amp;nbsp;we&amp;nbsp;did&amp;nbsp;it.&amp;nbsp;Now&amp;nbsp;there's&amp;nbsp;a&amp;nbsp;limit.&lt;br /&gt;그것이&amp;nbsp;우리가&amp;nbsp;한&amp;nbsp;방식입니다.&amp;nbsp;물론&amp;nbsp;한계가&amp;nbsp;있습니다.&lt;br /&gt;&lt;br /&gt;Maybe&amp;nbsp;the&amp;nbsp;guy&amp;nbsp;has&amp;nbsp;to&amp;nbsp;stop&amp;nbsp;at&amp;nbsp;25.&lt;br /&gt;어쩌면&amp;nbsp;그&amp;nbsp;사람은&amp;nbsp;25개에서&amp;nbsp;멈춰야&amp;nbsp;할지도&amp;nbsp;모릅니다.&lt;br /&gt;&lt;br /&gt;And&amp;nbsp;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;that&amp;nbsp;is&amp;nbsp;happening&amp;nbsp;to&amp;nbsp;us&amp;nbsp;yet.&lt;br /&gt;하지만&amp;nbsp;우리에게는&amp;nbsp;아직&amp;nbsp;그런&amp;nbsp;일이&amp;nbsp;일어나지&amp;nbsp;않았다고&amp;nbsp;생각합니다.&lt;br /&gt;&lt;br /&gt;Our&amp;nbsp;return&amp;nbsp;is&amp;nbsp;slowing&amp;nbsp;down.&amp;nbsp;But&amp;nbsp;Berkshire&amp;nbsp;is&amp;nbsp;still&amp;nbsp;a&amp;nbsp;collection&amp;nbsp;of&amp;nbsp;businesses&amp;nbsp;that&amp;nbsp;are&amp;nbsp;above&amp;nbsp;the&amp;nbsp;average&amp;nbsp;quality&amp;nbsp;of&amp;nbsp;the&amp;nbsp;indexes.&lt;br /&gt;우리의&amp;nbsp;수익률은&amp;nbsp;둔화되고&amp;nbsp;있지만,&amp;nbsp;버크셔는&amp;nbsp;여전히&amp;nbsp;평균&amp;nbsp;지수보다&amp;nbsp;높은&amp;nbsp;품질의&amp;nbsp;사업들로&amp;nbsp;이루어져&amp;nbsp;있습니다.&lt;br /&gt;&lt;br /&gt;So&amp;nbsp;it's&amp;nbsp;a&amp;nbsp;very&amp;nbsp;respectable&amp;nbsp;investment&amp;nbsp;even&amp;nbsp;though&amp;nbsp;it&amp;nbsp;can't&amp;nbsp;work&amp;nbsp;the&amp;nbsp;kind&amp;nbsp;of&amp;nbsp;miracles&amp;nbsp;it&amp;nbsp;did&amp;nbsp;when&amp;nbsp;we&amp;nbsp;were&amp;nbsp;young.&lt;br /&gt;그래서&amp;nbsp;비록&amp;nbsp;우리가&amp;nbsp;젊었을&amp;nbsp;때처럼&amp;nbsp;기적&amp;nbsp;같은&amp;nbsp;일을&amp;nbsp;해낼&amp;nbsp;수는&amp;nbsp;없지만,&amp;nbsp;여전히&amp;nbsp;매우&amp;nbsp;존경받을&amp;nbsp;만한&amp;nbsp;투자입니다.&lt;br /&gt;&lt;br /&gt;That's&amp;nbsp;a&amp;nbsp;source&amp;nbsp;of&amp;nbsp;enormous&amp;nbsp;satisfaction&amp;nbsp;to&amp;nbsp;both&amp;nbsp;of&amp;nbsp;us.&lt;br /&gt;그것은&amp;nbsp;우리&amp;nbsp;둘&amp;nbsp;모두에게&amp;nbsp;엄청난&amp;nbsp;만족의&amp;nbsp;원천입니다.&lt;br /&gt;&lt;br /&gt;What&amp;nbsp;are&amp;nbsp;your&amp;nbsp;thoughts&amp;nbsp;on&amp;nbsp;the&amp;nbsp;future&amp;nbsp;of&amp;nbsp;the&amp;nbsp;Berkshire&amp;nbsp;Juggler&amp;nbsp;juggle&amp;nbsp;25&amp;nbsp;milk&amp;nbsp;bottles?&lt;br /&gt;버크셔&amp;nbsp;저글러가&amp;nbsp;25개의&amp;nbsp;우유병을&amp;nbsp;저글링하는&amp;nbsp;것에&amp;nbsp;대한&amp;nbsp;당신의&amp;nbsp;생각은&amp;nbsp;무엇입니까?&lt;/p&gt;</description>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/68</guid>
      <comments>https://ynebula.tistory.com/68#entry68comment</comments>
      <pubDate>Wed, 11 Sep 2024 21:36:27 +0900</pubDate>
    </item>
    <item>
      <title>촬리멍거 버크셔해서웨이 투자 철학/조언</title>
      <link>https://ynebula.tistory.com/67</link>
      <description>&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;a href=&quot;https://youtu.be/fdberGATM_8&quot; target=&quot;_blank&quot; rel=&quot;noopener&amp;nbsp;noreferrer&quot;&gt;https://youtu.be/fdberGATM_8&lt;/a&gt;&lt;/p&gt;
&lt;figure data-ke-type=&quot;video&quot; data-ke-style=&quot;alignCenter&quot; data-video-host=&quot;youtube&quot; data-video-url=&quot;https://www.youtube.com/watch?v=fdberGATM_8&quot; data-video-thumbnail=&quot;https://scrap.kakaocdn.net/dn/bhSW3k/hyWZnEuoFn/oBPrgTsttsfA5CEnaDs3tk/img.jpg?width=1280&amp;amp;height=720&amp;amp;face=662_142_922_426&quot; data-video-width=&quot;860&quot; data-video-height=&quot;484&quot; data-video-origin-width=&quot;860&quot; data-video-origin-height=&quot;484&quot; data-ke-mobilestyle=&quot;widthContent&quot; data-video-title=&quot;촬리멍거 버크셔해서웨이 투자 철학/조언&quot; data-original-url=&quot;&quot;&gt;&lt;iframe src=&quot;https://www.youtube.com/embed/fdberGATM_8&quot; width=&quot;860&quot; height=&quot;484&quot; frameborder=&quot;&quot; allowfullscreen=&quot;true&quot;&gt;&lt;/iframe&gt;
&lt;figcaption style=&quot;display: none;&quot;&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p data-ke-size=&quot;size16&quot;&gt;&lt;br /&gt;[The&amp;nbsp;blueprint&amp;nbsp;of&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;-&amp;nbsp;버크셔&amp;nbsp;해서웨이의&amp;nbsp;청사진] &lt;br /&gt;Well,&amp;nbsp;he'd&amp;nbsp;made&amp;nbsp;so&amp;nbsp;much&amp;nbsp;money&amp;nbsp;for&amp;nbsp;so&amp;nbsp;long,&amp;nbsp;doing&amp;nbsp;what&amp;nbsp;he'd&amp;nbsp;been&amp;nbsp;taught&amp;nbsp;by&amp;nbsp;Ben&amp;nbsp;Graham,&amp;nbsp;which&amp;nbsp;is&amp;nbsp;to&amp;nbsp;buy&amp;nbsp;these&amp;nbsp;very&amp;nbsp;cheap&amp;nbsp;stocks,&amp;nbsp; &lt;br /&gt;그는&amp;nbsp;벤&amp;nbsp;그레이엄에게&amp;nbsp;배운&amp;nbsp;대로&amp;nbsp;매우&amp;nbsp;저렴한&amp;nbsp;주식을&amp;nbsp;매수하는&amp;nbsp;방식으로&amp;nbsp;오랫동안&amp;nbsp;많은&amp;nbsp;돈을&amp;nbsp;벌었습니다.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;and&amp;nbsp;if&amp;nbsp;they&amp;nbsp;were&amp;nbsp;cheap&amp;nbsp;enough,&amp;nbsp;he&amp;nbsp;didn't&amp;nbsp;care&amp;nbsp;it&amp;nbsp;was&amp;nbsp;a&amp;nbsp;lousy&amp;nbsp;company&amp;nbsp;and&amp;nbsp;a&amp;nbsp;lousy&amp;nbsp;management. &lt;br /&gt;주식이&amp;nbsp;충분히&amp;nbsp;저렴하다면&amp;nbsp;회사가&amp;nbsp;형편없고&amp;nbsp;경영진이&amp;nbsp;좋지&amp;nbsp;않아도&amp;nbsp;상관하지&amp;nbsp;않았습니다. &lt;br /&gt;&lt;br /&gt;He&amp;nbsp;knew&amp;nbsp;it&amp;nbsp;was&amp;nbsp;going&amp;nbsp;to&amp;nbsp;be&amp;nbsp;money&amp;nbsp;anyway&amp;nbsp;just&amp;nbsp;because&amp;nbsp;of&amp;nbsp;the&amp;nbsp;cheapness,&amp;nbsp; &lt;br /&gt;그는&amp;nbsp;단지&amp;nbsp;저렴함&amp;nbsp;때문에&amp;nbsp;어쨌든&amp;nbsp;돈이&amp;nbsp;될&amp;nbsp;것이라는&amp;nbsp;것을&amp;nbsp;알고&amp;nbsp;있었습니다. &lt;br /&gt;&lt;br /&gt;and&amp;nbsp;I&amp;nbsp;always&amp;nbsp;knew&amp;nbsp;that&amp;nbsp;would&amp;nbsp;be&amp;nbsp;self-limiting,&amp;nbsp;that&amp;nbsp;would&amp;nbsp;only&amp;nbsp;be&amp;nbsp;available&amp;nbsp;for&amp;nbsp;a&amp;nbsp;while&amp;nbsp;and&amp;nbsp;then&amp;nbsp;it&amp;nbsp;would&amp;nbsp;go&amp;nbsp;away,&amp;nbsp; &lt;br /&gt;저는&amp;nbsp;항상&amp;nbsp;이것이&amp;nbsp;자기&amp;nbsp;제한적이며,&amp;nbsp;잠시&amp;nbsp;동안만&amp;nbsp;가능하고&amp;nbsp;결국&amp;nbsp;사라질&amp;nbsp;것이라는&amp;nbsp;것을&amp;nbsp;알고&amp;nbsp;있었습니다.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;and&amp;nbsp;it&amp;nbsp;would&amp;nbsp;be&amp;nbsp;easier&amp;nbsp;to&amp;nbsp;make&amp;nbsp;money&amp;nbsp;by&amp;nbsp;getting&amp;nbsp;into&amp;nbsp;the&amp;nbsp;great&amp;nbsp;businesses&amp;nbsp;that&amp;nbsp;either&amp;nbsp;had&amp;nbsp;a&amp;nbsp;great&amp;nbsp;manager&amp;nbsp;or&amp;nbsp;were&amp;nbsp;businesses&amp;nbsp;where&amp;nbsp;a&amp;nbsp;fool&amp;nbsp;could&amp;nbsp;run&amp;nbsp;and&amp;nbsp;still&amp;nbsp;prosper. &lt;br /&gt;그리고&amp;nbsp;훌륭한&amp;nbsp;경영자가&amp;nbsp;있거나&amp;nbsp;바보라도&amp;nbsp;운영할&amp;nbsp;수&amp;nbsp;있는&amp;nbsp;훌륭한&amp;nbsp;사업에&amp;nbsp;투자하는&amp;nbsp;것이&amp;nbsp;더&amp;nbsp;쉽게&amp;nbsp;돈을&amp;nbsp;벌&amp;nbsp;수&amp;nbsp;있는&amp;nbsp;방법이&amp;nbsp;될&amp;nbsp;것이라고&amp;nbsp;생각했습니다. &lt;br /&gt;&lt;br /&gt;So&amp;nbsp;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;I&amp;nbsp;did&amp;nbsp;anything&amp;nbsp;but&amp;nbsp;maybe&amp;nbsp;cause&amp;nbsp;Warren&amp;nbsp;to&amp;nbsp;go&amp;nbsp;where&amp;nbsp;he&amp;nbsp;was&amp;nbsp;going&amp;nbsp;to&amp;nbsp;go&amp;nbsp;anyway&amp;nbsp;a&amp;nbsp;little&amp;nbsp;faster. &lt;br /&gt;그래서&amp;nbsp;제가&amp;nbsp;한&amp;nbsp;일은&amp;nbsp;아마도&amp;nbsp;워렌이&amp;nbsp;어차피&amp;nbsp;가려고&amp;nbsp;했던&amp;nbsp;곳으로&amp;nbsp;조금&amp;nbsp;더&amp;nbsp;빨리&amp;nbsp;가도록&amp;nbsp;만든&amp;nbsp;것&amp;nbsp;뿐이라고&amp;nbsp;생각합니다. &lt;br /&gt;&lt;br /&gt;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;I&amp;nbsp;changed&amp;nbsp;him.&amp;nbsp;I&amp;nbsp;think&amp;nbsp;he&amp;nbsp;would&amp;nbsp;have&amp;nbsp;been&amp;nbsp;there&amp;nbsp;anyway. &lt;br /&gt;저는&amp;nbsp;제가&amp;nbsp;그를&amp;nbsp;변화시켰다고&amp;nbsp;생각하지&amp;nbsp;않습니다.&amp;nbsp;저는&amp;nbsp;그가&amp;nbsp;어차피&amp;nbsp;그곳에&amp;nbsp;있었을&amp;nbsp;것이라고&amp;nbsp;생각합니다. &lt;br /&gt;&lt;br /&gt;There&amp;nbsp;was&amp;nbsp;more&amp;nbsp;potential&amp;nbsp;for&amp;nbsp;the&amp;nbsp;long&amp;nbsp;pull&amp;nbsp;than&amp;nbsp;getting&amp;nbsp;in&amp;nbsp;the&amp;nbsp;good&amp;nbsp;companies. &lt;br /&gt;장기적으로&amp;nbsp;볼&amp;nbsp;때&amp;nbsp;좋은&amp;nbsp;회사에&amp;nbsp;투자하는&amp;nbsp;것이&amp;nbsp;더&amp;nbsp;큰&amp;nbsp;잠재력이&amp;nbsp;있었습니다. &lt;br /&gt;&lt;br /&gt;We&amp;nbsp;both&amp;nbsp;wanted&amp;nbsp;them&amp;nbsp;cheap,&amp;nbsp;but&amp;nbsp;cheap&amp;nbsp;good&amp;nbsp;companies&amp;nbsp;was&amp;nbsp;the&amp;nbsp;field&amp;nbsp;that&amp;nbsp;we&amp;nbsp;shifted&amp;nbsp;to,&amp;nbsp; &lt;br /&gt;우리&amp;nbsp;둘&amp;nbsp;다&amp;nbsp;저렴한&amp;nbsp;것을&amp;nbsp;원했지만,&amp;nbsp;우리가&amp;nbsp;옮겨간&amp;nbsp;분야는&amp;nbsp;저렴한&amp;nbsp;좋은&amp;nbsp;회사들이었습니다.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;and&amp;nbsp;of&amp;nbsp;course&amp;nbsp;that&amp;nbsp;was&amp;nbsp;really&amp;nbsp;important&amp;nbsp;when&amp;nbsp;we&amp;nbsp;started&amp;nbsp;to&amp;nbsp;buy&amp;nbsp;whole&amp;nbsp;companies. &lt;br /&gt;그리고&amp;nbsp;물론&amp;nbsp;이는&amp;nbsp;우리가&amp;nbsp;전체&amp;nbsp;회사를&amp;nbsp;매입하기&amp;nbsp;시작했을&amp;nbsp;때&amp;nbsp;정말&amp;nbsp;중요했습니다. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;[Value&amp;nbsp;investing&amp;nbsp;-&amp;nbsp;] &lt;br /&gt;He&amp;nbsp;made&amp;nbsp;millions&amp;nbsp;and&amp;nbsp;millions&amp;nbsp;of&amp;nbsp;dollars&amp;nbsp;value&amp;nbsp;investing&amp;nbsp;in&amp;nbsp;lousy&amp;nbsp;companies&amp;nbsp;that&amp;nbsp;he&amp;nbsp;bought&amp;nbsp;very&amp;nbsp;cheaply,&amp;nbsp;besides&amp;nbsp;it's&amp;nbsp;unpleasant&amp;nbsp;to&amp;nbsp;watch&amp;nbsp;lousy&amp;nbsp;companies&amp;nbsp;you&amp;nbsp;don't&amp;nbsp;like. &lt;br /&gt;그는&amp;nbsp;매우&amp;nbsp;싸게&amp;nbsp;구입한&amp;nbsp;형편없는&amp;nbsp;회사들에&amp;nbsp;가치&amp;nbsp;투자를&amp;nbsp;해서&amp;nbsp;수백만&amp;nbsp;달러를&amp;nbsp;벌었지만,&amp;nbsp;좋아하지&amp;nbsp;않는&amp;nbsp;형편없는&amp;nbsp;회사들을&amp;nbsp;지켜보는&amp;nbsp;것은&amp;nbsp;불쾌한&amp;nbsp;일이다. &lt;br /&gt;&lt;br /&gt;It's&amp;nbsp;much&amp;nbsp;more&amp;nbsp;fun&amp;nbsp;to&amp;nbsp;watch&amp;nbsp;somebody&amp;nbsp;you&amp;nbsp;would&amp;nbsp;like&amp;nbsp;and&amp;nbsp;admire&amp;nbsp;succeeding&amp;nbsp;than&amp;nbsp;watching&amp;nbsp;some&amp;nbsp;jerk&amp;nbsp;kind&amp;nbsp;of&amp;nbsp;half&amp;nbsp;mismanaged&amp;nbsp;company&amp;nbsp;that's&amp;nbsp;very&amp;nbsp;cheap. &lt;br /&gt;당신이&amp;nbsp;좋아하고&amp;nbsp;존경하는&amp;nbsp;누군가가&amp;nbsp;성공하는&amp;nbsp;것을&amp;nbsp;지켜보는&amp;nbsp;것이&amp;nbsp;매우&amp;nbsp;저렴한&amp;nbsp;가격에&amp;nbsp;살&amp;nbsp;수&amp;nbsp;있는&amp;nbsp;엉터리&amp;nbsp;같은&amp;nbsp;반쯤&amp;nbsp;잘못&amp;nbsp;경영되는&amp;nbsp;회사를&amp;nbsp;지켜보는&amp;nbsp;것보다&amp;nbsp;훨씬&amp;nbsp;더&amp;nbsp;재미있다. &lt;br /&gt;&lt;br /&gt;It's&amp;nbsp;a&amp;nbsp;better&amp;nbsp;life.&amp;nbsp;It's&amp;nbsp;the&amp;nbsp;reason&amp;nbsp;we&amp;nbsp;don't&amp;nbsp;short&amp;nbsp;stocks. &lt;br /&gt;그것이&amp;nbsp;더&amp;nbsp;나은&amp;nbsp;삶입니다.&amp;nbsp;우리가&amp;nbsp;주식을&amp;nbsp;공매도하지&amp;nbsp;않는&amp;nbsp;이유입니다. &lt;br /&gt;&lt;br /&gt;Even&amp;nbsp;if&amp;nbsp;we&amp;nbsp;could&amp;nbsp;make&amp;nbsp;a&amp;nbsp;lot&amp;nbsp;of&amp;nbsp;money&amp;nbsp;doing&amp;nbsp;it,&amp;nbsp;either&amp;nbsp;one&amp;nbsp;of&amp;nbsp;us&amp;nbsp;would&amp;nbsp;bother. &lt;br /&gt;설령&amp;nbsp;우리가&amp;nbsp;그렇게&amp;nbsp;해서&amp;nbsp;많은&amp;nbsp;돈을&amp;nbsp;벌&amp;nbsp;수&amp;nbsp;있다고&amp;nbsp;해도,&amp;nbsp;우리&amp;nbsp;둘&amp;nbsp;중&amp;nbsp;누구도&amp;nbsp;신경&amp;nbsp;쓰지&amp;nbsp;않을&amp;nbsp;것입니다. &lt;br /&gt;&lt;br /&gt;We'd&amp;nbsp;find&amp;nbsp;it&amp;nbsp;unpleasant. &lt;br /&gt;우리는&amp;nbsp;그것을&amp;nbsp;불쾌하게&amp;nbsp;여길&amp;nbsp;것이다. &lt;br /&gt;&lt;br /&gt;You're&amp;nbsp;crazy&amp;nbsp;if&amp;nbsp;you're&amp;nbsp;rich&amp;nbsp;to&amp;nbsp;deliver&amp;nbsp;legal&amp;nbsp;out&amp;nbsp;and&amp;nbsp;do&amp;nbsp;a&amp;nbsp;lot&amp;nbsp;of&amp;nbsp;unpleasant&amp;nbsp;things&amp;nbsp;you&amp;nbsp;don't&amp;nbsp;have&amp;nbsp;to. &lt;br /&gt;당신이&amp;nbsp;부자라면&amp;nbsp;법적으로&amp;nbsp;할&amp;nbsp;필요가&amp;nbsp;없는&amp;nbsp;많은&amp;nbsp;불쾌한&amp;nbsp;일들을&amp;nbsp;하는&amp;nbsp;것은&amp;nbsp;미친&amp;nbsp;짓입니다. &lt;br /&gt;&lt;br /&gt;Well&amp;nbsp;that&amp;nbsp;was&amp;nbsp;the&amp;nbsp;most&amp;nbsp;useful&amp;nbsp;idea&amp;nbsp;that&amp;nbsp;Ben&amp;nbsp;Graham&amp;nbsp;ever&amp;nbsp;had. &lt;br /&gt;그것이&amp;nbsp;벤&amp;nbsp;그레이엄이&amp;nbsp;가졌던&amp;nbsp;가장&amp;nbsp;유용한&amp;nbsp;아이디어였습니다. &lt;br /&gt;&lt;br /&gt;Have&amp;nbsp;the&amp;nbsp;mindset&amp;nbsp;of&amp;nbsp;somebody&amp;nbsp;that&amp;nbsp;was&amp;nbsp;buying&amp;nbsp;into&amp;nbsp;a&amp;nbsp;business&amp;nbsp;planning&amp;nbsp;to&amp;nbsp;hold&amp;nbsp;for&amp;nbsp;the&amp;nbsp;long&amp;nbsp;pull,&amp;nbsp;and&amp;nbsp;use&amp;nbsp;that&amp;nbsp;mindset&amp;nbsp;when&amp;nbsp;thinking&amp;nbsp;of&amp;nbsp;stocks,&amp;nbsp;and&amp;nbsp;neither&amp;nbsp;one&amp;nbsp;of&amp;nbsp;us&amp;nbsp;have&amp;nbsp;ever&amp;nbsp;departed&amp;nbsp;from&amp;nbsp;that&amp;nbsp;one. &lt;br /&gt;장기적으로&amp;nbsp;보유할&amp;nbsp;계획으로&amp;nbsp;사업에&amp;nbsp;투자하는&amp;nbsp;사람의&amp;nbsp;마인드셋을&amp;nbsp;가지고,&amp;nbsp;그&amp;nbsp;마인드셋을&amp;nbsp;주식을&amp;nbsp;생각할&amp;nbsp;때&amp;nbsp;사용하세요.&amp;nbsp;우리&amp;nbsp;둘&amp;nbsp;다&amp;nbsp;그&amp;nbsp;원칙에서&amp;nbsp;한&amp;nbsp;번도&amp;nbsp;벗어난&amp;nbsp;적이&amp;nbsp;없습니다. &lt;br /&gt;&lt;br /&gt;[See's&amp;nbsp;Candy&amp;nbsp;-&amp;nbsp;시즈&amp;nbsp;캔디] &lt;br /&gt;Remember,&amp;nbsp;Warren&amp;nbsp;had&amp;nbsp;a&amp;nbsp;long&amp;nbsp;history&amp;nbsp;of&amp;nbsp;buying&amp;nbsp;stocks&amp;nbsp;below&amp;nbsp;working&amp;nbsp;capital&amp;nbsp;per&amp;nbsp;share,&amp;nbsp;hugely&amp;nbsp;cheap&amp;nbsp;securities,&amp;nbsp;and&amp;nbsp;by&amp;nbsp;definition&amp;nbsp;they&amp;nbsp;were&amp;nbsp;all&amp;nbsp;pretty&amp;nbsp;lousy&amp;nbsp;companies. &lt;br /&gt;기억하세요,&amp;nbsp;워렌은&amp;nbsp;주당&amp;nbsp;운전자본&amp;nbsp;이하의&amp;nbsp;가격으로&amp;nbsp;주식을&amp;nbsp;매입하고,&amp;nbsp;엄청나게&amp;nbsp;저렴한&amp;nbsp;증권을&amp;nbsp;사들이는&amp;nbsp;오랜&amp;nbsp;역사를&amp;nbsp;가지고&amp;nbsp;있었습니다.&amp;nbsp;그리고&amp;nbsp;정의상&amp;nbsp;그것들은&amp;nbsp;모두&amp;nbsp;꽤&amp;nbsp;형편없는&amp;nbsp;회사들이었습니다. &lt;br /&gt;&lt;br /&gt;In&amp;nbsp;Seas&amp;nbsp;we&amp;nbsp;bought&amp;nbsp;a&amp;nbsp;really&amp;nbsp;good&amp;nbsp;company. &lt;br /&gt;시즈(See's)에서&amp;nbsp;우리는&amp;nbsp;정말&amp;nbsp;좋은&amp;nbsp;회사를&amp;nbsp;샀습니다. &lt;br /&gt;&lt;br /&gt;In&amp;nbsp;its&amp;nbsp;field&amp;nbsp;it&amp;nbsp;was&amp;nbsp;the&amp;nbsp;best,&amp;nbsp;and&amp;nbsp;that's&amp;nbsp;part&amp;nbsp;of&amp;nbsp;California,&amp;nbsp;which&amp;nbsp;is&amp;nbsp;pretty&amp;nbsp;much&amp;nbsp;all&amp;nbsp;of&amp;nbsp;California,&amp;nbsp;and&amp;nbsp;it&amp;nbsp;had&amp;nbsp;a&amp;nbsp;wonderful&amp;nbsp;product,&amp;nbsp;a&amp;nbsp;wonderful&amp;nbsp;reputation&amp;nbsp;and&amp;nbsp;so&amp;nbsp;on,&amp;nbsp;and&amp;nbsp;it&amp;nbsp;had&amp;nbsp;a&amp;nbsp;powerful&amp;nbsp;trademark,&amp;nbsp;and&amp;nbsp;a&amp;nbsp;good&amp;nbsp;culture. &lt;br /&gt;그&amp;nbsp;분야에서&amp;nbsp;최고였고,&amp;nbsp;그것은&amp;nbsp;캘리포니아의&amp;nbsp;일부였는데,&amp;nbsp;사실상&amp;nbsp;캘리포니아&amp;nbsp;전체였습니다.&amp;nbsp;그리고&amp;nbsp;그것은&amp;nbsp;훌륭한&amp;nbsp;제품,&amp;nbsp;훌륭한&amp;nbsp;평판&amp;nbsp;등을&amp;nbsp;가지고&amp;nbsp;있었고,&amp;nbsp;강력한&amp;nbsp;상표와&amp;nbsp;좋은&amp;nbsp;문화를&amp;nbsp;가지고&amp;nbsp;있었습니다. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;We&amp;nbsp;bought&amp;nbsp;that&amp;nbsp;and&amp;nbsp;made&amp;nbsp;so&amp;nbsp;much&amp;nbsp;money. &lt;br /&gt;우리는&amp;nbsp;그것을&amp;nbsp;사서&amp;nbsp;엄청난&amp;nbsp;돈을&amp;nbsp;벌었습니다. &lt;br /&gt;&lt;br /&gt;It&amp;nbsp;just&amp;nbsp;was&amp;nbsp;eye-opening&amp;nbsp;how&amp;nbsp;important&amp;nbsp;these&amp;nbsp;brands&amp;nbsp;were. &lt;br /&gt;이런&amp;nbsp;브랜드들이&amp;nbsp;얼마나&amp;nbsp;중요한지&amp;nbsp;깨닫게&amp;nbsp;되는&amp;nbsp;눈이&amp;nbsp;번쩍&amp;nbsp;뜨이는&amp;nbsp;경험이었습니다. &lt;br /&gt;&lt;br /&gt;I&amp;nbsp;don't&amp;nbsp;think&amp;nbsp;that&amp;nbsp;Warren&amp;nbsp;would&amp;nbsp;have&amp;nbsp;made&amp;nbsp;all&amp;nbsp;the&amp;nbsp;money&amp;nbsp;that&amp;nbsp;Berkshire&amp;nbsp;made&amp;nbsp;in&amp;nbsp;Coca-Cola. &lt;br /&gt;워렌이&amp;nbsp;코카콜라에서&amp;nbsp;버크셔가&amp;nbsp;번&amp;nbsp;모든&amp;nbsp;돈을&amp;nbsp;벌었을&amp;nbsp;거라고&amp;nbsp;생각하지&amp;nbsp;않습니다. &lt;br /&gt;&lt;br /&gt;If&amp;nbsp;he&amp;nbsp;hadn't&amp;nbsp;bought&amp;nbsp;Seas,&amp;nbsp;he&amp;nbsp;learned&amp;nbsp;the&amp;nbsp;record&amp;nbsp;of&amp;nbsp;Berkshire&amp;nbsp;Hathaway&amp;nbsp;and&amp;nbsp;the&amp;nbsp;record&amp;nbsp;of&amp;nbsp;Warren&amp;nbsp;Buffett&amp;nbsp;is&amp;nbsp;a&amp;nbsp;record&amp;nbsp;based&amp;nbsp;on&amp;nbsp;continuous&amp;nbsp;learning. &lt;br /&gt;만약&amp;nbsp;그가&amp;nbsp;시즈(See's)를&amp;nbsp;사지&amp;nbsp;않았다면,&amp;nbsp;버크셔&amp;nbsp;해서웨이의&amp;nbsp;실적과&amp;nbsp;워렌&amp;nbsp;버핏의&amp;nbsp;실적이&amp;nbsp;지속적인&amp;nbsp;학습을&amp;nbsp;바탕으로&amp;nbsp;한&amp;nbsp;것임을&amp;nbsp;알게&amp;nbsp;되었을&amp;nbsp;것입니다. &lt;br /&gt;&lt;br /&gt;If&amp;nbsp;he&amp;nbsp;hadn't&amp;nbsp;kept&amp;nbsp;learning&amp;nbsp;from&amp;nbsp;every&amp;nbsp;experience,&amp;nbsp;the&amp;nbsp;record&amp;nbsp;would&amp;nbsp;not&amp;nbsp;be&amp;nbsp;as&amp;nbsp;good. &lt;br /&gt;만약&amp;nbsp;그가&amp;nbsp;모든&amp;nbsp;경험에서&amp;nbsp;계속&amp;nbsp;배우지&amp;nbsp;않았다면,&amp;nbsp;그의&amp;nbsp;실적은&amp;nbsp;지금처럼&amp;nbsp;좋지&amp;nbsp;않았을&amp;nbsp;것입니다. &lt;br /&gt;&lt;br /&gt;He&amp;nbsp;learned&amp;nbsp;from&amp;nbsp;Seas&amp;nbsp;that&amp;nbsp;he&amp;nbsp;should&amp;nbsp;buy&amp;nbsp;Coca-Cola. &lt;br /&gt;그는&amp;nbsp;시즈(See's)에서&amp;nbsp;배운&amp;nbsp;것을&amp;nbsp;통해&amp;nbsp;코카콜라를&amp;nbsp;사야&amp;nbsp;한다는&amp;nbsp;것을&amp;nbsp;깨달았습니다. &lt;br /&gt;&lt;br /&gt;You&amp;nbsp;really&amp;nbsp;can&amp;nbsp;understand&amp;nbsp;the&amp;nbsp;power&amp;nbsp;of&amp;nbsp;a&amp;nbsp;brand&amp;nbsp;more&amp;nbsp;when&amp;nbsp;you&amp;nbsp;buy&amp;nbsp;something&amp;nbsp;very&amp;nbsp;cheaply&amp;nbsp;and&amp;nbsp;you're&amp;nbsp;starting&amp;nbsp;to&amp;nbsp;get&amp;nbsp;300%&amp;nbsp;per&amp;nbsp;annum&amp;nbsp;on&amp;nbsp;your&amp;nbsp;investment&amp;nbsp;in&amp;nbsp;cash. &lt;br /&gt;당신이&amp;nbsp;무언가를&amp;nbsp;매우&amp;nbsp;저렴하게&amp;nbsp;사고&amp;nbsp;투자에&amp;nbsp;대해&amp;nbsp;연간&amp;nbsp;300%의&amp;nbsp;현금&amp;nbsp;수익을&amp;nbsp;얻기&amp;nbsp;시작할&amp;nbsp;때,&amp;nbsp;브랜드의&amp;nbsp;힘을&amp;nbsp;정말로&amp;nbsp;이해할&amp;nbsp;수&amp;nbsp;있습니다. &lt;br /&gt;&lt;br /&gt;That&amp;nbsp;draws&amp;nbsp;your&amp;nbsp;attention&amp;nbsp;that&amp;nbsp;a&amp;nbsp;brand&amp;nbsp;can&amp;nbsp;be&amp;nbsp;very&amp;nbsp;important. &lt;br /&gt;그것은&amp;nbsp;브랜드가&amp;nbsp;매우&amp;nbsp;중요할&amp;nbsp;수&amp;nbsp;있다는&amp;nbsp;점에&amp;nbsp;당신의&amp;nbsp;주의를&amp;nbsp;끕니다. &lt;/p&gt;</description>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/67</guid>
      <comments>https://ynebula.tistory.com/67#entry67comment</comments>
      <pubDate>Tue, 10 Sep 2024 21:53:32 +0900</pubDate>
    </item>
    <item>
      <title>RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`</title>
      <link>https://ynebula.tistory.com/62</link>
      <description>&lt;p&gt;Robertatransformers에서 지원하는 Roberta를 기반으로 Korquad 데이터를 학습 중 입니다.&amp;nbsp; 한국어를 학습하기 위해서 Multilingual를 지원하는 XLM-RoBERTa를 사용하도록 소스를 수정했습니다.&amp;nbsp; 소스를&amp;nbsp;수정하고&amp;nbsp;&lt;a href=&quot;run_squad.py를&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;run_squad.py를&lt;/a&gt;&amp;nbsp;수행하니&amp;nbsp;다음과&amp;nbsp;같은&amp;nbsp;에러가&amp;nbsp;발생했습니다.&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 100%;&quot;&gt;/pytorch/aten/src/THC/&lt;a href=&quot;THCTensorIndex.cu:361:&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;THCTensorIndex.cu:361:&lt;/a&gt;&amp;nbsp;void&amp;nbsp;indexSelectLargeIndex(TensorInfo&amp;lt;T,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;TensorInfo&amp;lt;T,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;TensorInfo&amp;lt;long,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;int,&amp;nbsp;int,&amp;nbsp;IndexType,&amp;nbsp;IndexType,&amp;nbsp;long)&amp;nbsp;[with&amp;nbsp;T&amp;nbsp;=&amp;nbsp;float,&amp;nbsp;IndexType&amp;nbsp;=&amp;nbsp;unsigned&amp;nbsp;int,&amp;nbsp;DstDim&amp;nbsp;=&amp;nbsp;2,&amp;nbsp;SrcDim&amp;nbsp;=&amp;nbsp;2,&amp;nbsp;IdxDim&amp;nbsp;=&amp;nbsp;-2,&amp;nbsp;IndexIsMajor&amp;nbsp;=&amp;nbsp;true]:&amp;nbsp;block:&amp;nbsp;[6,0,0],&amp;nbsp;thread:&amp;nbsp;[29,0,0]&amp;nbsp;Assertion&amp;nbsp;`srcIndex&amp;nbsp;&amp;lt;&amp;nbsp;srcSelectDimSize`&amp;nbsp;failed. &lt;br /&gt;/pytorch/aten/src/THC/&lt;a href=&quot;THCTensorIndex.cu:361:&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;THCTensorIndex.cu:361:&lt;/a&gt;&amp;nbsp;void&amp;nbsp;indexSelectLargeIndex(TensorInfo&amp;lt;T,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;TensorInfo&amp;lt;T,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;TensorInfo&amp;lt;long,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;int,&amp;nbsp;int,&amp;nbsp;IndexType,&amp;nbsp;IndexType,&amp;nbsp;long)&amp;nbsp;[with&amp;nbsp;T&amp;nbsp;=&amp;nbsp;float,&amp;nbsp;IndexType&amp;nbsp;=&amp;nbsp;unsigned&amp;nbsp;int,&amp;nbsp;DstDim&amp;nbsp;=&amp;nbsp;2,&amp;nbsp;SrcDim&amp;nbsp;=&amp;nbsp;2,&amp;nbsp;IdxDim&amp;nbsp;=&amp;nbsp;-2,&amp;nbsp;IndexIsMajor&amp;nbsp;=&amp;nbsp;true]:&amp;nbsp;block:&amp;nbsp;[6,0,0],&amp;nbsp;thread:&amp;nbsp;[30,0,0]&amp;nbsp;Assertion&amp;nbsp;`srcIndex&amp;nbsp;&amp;lt;&amp;nbsp;srcSelectDimSize`&amp;nbsp;failed. &lt;br /&gt;/pytorch/aten/src/THC/&lt;a href=&quot;THCTensorIndex.cu:361:&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;THCTensorIndex.cu:361:&lt;/a&gt;&amp;nbsp;void&amp;nbsp;indexSelectLargeIndex(TensorInfo&amp;lt;T,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;TensorInfo&amp;lt;T,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;TensorInfo&amp;lt;long,&amp;nbsp;IndexType&amp;gt;,&amp;nbsp;int,&amp;nbsp;int,&amp;nbsp;IndexType,&amp;nbsp;IndexType,&amp;nbsp;long)&amp;nbsp;[with&amp;nbsp;T&amp;nbsp;=&amp;nbsp;float,&amp;nbsp;IndexType&amp;nbsp;=&amp;nbsp;unsigned&amp;nbsp;int,&amp;nbsp;DstDim&amp;nbsp;=&amp;nbsp;2,&amp;nbsp;SrcDim&amp;nbsp;=&amp;nbsp;2,&amp;nbsp;IdxDim&amp;nbsp;=&amp;nbsp;-2,&amp;nbsp;IndexIsMajor&amp;nbsp;=&amp;nbsp;true]:&amp;nbsp;block:&amp;nbsp;[6,0,0],&amp;nbsp;thread:&amp;nbsp;[31,0,0]&amp;nbsp;Assertion&amp;nbsp;`srcIndex&amp;nbsp;&amp;lt;&amp;nbsp;srcSelectDimSize`&amp;nbsp;failed. &lt;br /&gt;Traceback&amp;nbsp;(most&amp;nbsp;recent&amp;nbsp;call&amp;nbsp;last): &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/content/drive/My&amp;nbsp;Drive/models/transformers/examples/&lt;a href=&quot;run_squad.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;run_squad.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;858,&amp;nbsp;in&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;main() &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/content/drive/My&amp;nbsp;Drive/models/transformers/examples/&lt;a href=&quot;run_squad.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;run_squad.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;797,&amp;nbsp;in&amp;nbsp;main &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;global_step,&amp;nbsp;tr_loss&amp;nbsp;=&amp;nbsp;train(args,&amp;nbsp;train_dataset,&amp;nbsp;model,&amp;nbsp;tokenizer) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/content/drive/My&amp;nbsp;Drive/models/transformers/examples/&lt;a href=&quot;run_squad.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;run_squad.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;231,&amp;nbsp;in&amp;nbsp;train &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;outputs&amp;nbsp;=&amp;nbsp;model(**inputs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;module.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;module.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;532,&amp;nbsp;in&amp;nbsp;__call__ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;result&amp;nbsp;=&amp;nbsp;self.forward(*input,&amp;nbsp;**kwargs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/transformers/&lt;a href=&quot;modeling_roberta.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;modeling_roberta.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;677,&amp;nbsp;in&amp;nbsp;forward &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;inputs_embeds=inputs_embeds, &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;module.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;module.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;532,&amp;nbsp;in&amp;nbsp;__call__ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;result&amp;nbsp;=&amp;nbsp;self.forward(*input,&amp;nbsp;**kwargs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/transformers/&lt;a href=&quot;modeling_bert.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;modeling_bert.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;806,&amp;nbsp;in&amp;nbsp;forward &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;encoder_attention_mask=encoder_extended_attention_mask, &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;module.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;module.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;532,&amp;nbsp;in&amp;nbsp;__call__ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;result&amp;nbsp;=&amp;nbsp;self.forward(*input,&amp;nbsp;**kwargs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/transformers/&lt;a href=&quot;modeling_bert.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;modeling_bert.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;423,&amp;nbsp;in&amp;nbsp;forward &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;hidden_states,&amp;nbsp;attention_mask,&amp;nbsp;head_mask[i],&amp;nbsp;encoder_hidden_states,&amp;nbsp;encoder_attention_mask &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;module.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;module.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;532,&amp;nbsp;in&amp;nbsp;__call__ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;result&amp;nbsp;=&amp;nbsp;self.forward(*input,&amp;nbsp;**kwargs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/transformers/&lt;a href=&quot;modeling_bert.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;modeling_bert.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;384,&amp;nbsp;in&amp;nbsp;forward &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;self_attention_outputs&amp;nbsp;=&amp;nbsp;self.attention(hidden_states,&amp;nbsp;attention_mask,&amp;nbsp;head_mask) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;module.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;module.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;532,&amp;nbsp;in&amp;nbsp;__call__ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;result&amp;nbsp;=&amp;nbsp;self.forward(*input,&amp;nbsp;**kwargs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/transformers/&lt;a href=&quot;modeling_bert.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;modeling_bert.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;330,&amp;nbsp;in&amp;nbsp;forward &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;hidden_states,&amp;nbsp;attention_mask,&amp;nbsp;head_mask,&amp;nbsp;encoder_hidden_states,&amp;nbsp;encoder_attention_mask &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;module.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;module.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;532,&amp;nbsp;in&amp;nbsp;__call__ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;result&amp;nbsp;=&amp;nbsp;self.forward(*input,&amp;nbsp;**kwargs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/transformers/&lt;a href=&quot;modeling_bert.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;modeling_bert.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;232,&amp;nbsp;in&amp;nbsp;forward &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;mixed_query_layer&amp;nbsp;=&amp;nbsp;&lt;a href=&quot;self.query(hidden_states)&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;self.query(hidden_states)&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;module.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;module.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;532,&amp;nbsp;in&amp;nbsp;__call__ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;result&amp;nbsp;=&amp;nbsp;self.forward(*input,&amp;nbsp;**kwargs) &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/modules/&lt;a href=&quot;linear.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;linear.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;87,&amp;nbsp;in&amp;nbsp;forward &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;return&amp;nbsp;F.linear(input,&amp;nbsp;&lt;a href=&quot;self.weight,&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;self.weight,&lt;/a&gt;&amp;nbsp;&lt;a href=&quot;self.bias)&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;self.bias)&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;File&amp;nbsp;&quot;/usr/local/lib/python3.6/dist-packages/torch/nn/&lt;a href=&quot;functional.py&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;functional.py&lt;/a&gt;&quot;,&amp;nbsp;line&amp;nbsp;1372,&amp;nbsp;in&amp;nbsp;linear &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;output&amp;nbsp;=&amp;nbsp;&lt;a href=&quot;input.matmul(weight.t())&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;input.matmul(weight.t())&lt;/a&gt; &lt;br /&gt;RuntimeError:&amp;nbsp;CUDA&amp;nbsp;error:&amp;nbsp;CUBLAS_STATUS_ALLOC_FAILED&amp;nbsp;when&amp;nbsp;calling&amp;nbsp;`cublasCreate(handle)`&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;에러 메시지를 보면 시작점에 무언가 문제가 있다고 표시하고 있습니다.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;단위&amp;nbsp;테스트&amp;nbsp;할&amp;nbsp;때는&amp;nbsp;크게&amp;nbsp;문게가&amp;nbsp;있어&amp;nbsp;보이지&amp;nbsp;않았습니다.&amp;nbsp;이렇게&amp;nbsp;또&amp;nbsp;여러번의&amp;nbsp;삽질이&amp;nbsp;들어&amp;nbsp;갔습니다. &lt;br /&gt;처음부터&amp;nbsp;변경한&amp;nbsp;점을&amp;nbsp;다시&amp;nbsp;확인하다가&amp;nbsp;&lt;span style=&quot;color: #ee2323;&quot;&gt;&lt;a style=&quot;color: #ee2323;&quot; href=&quot;config.json이&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;config.json이&lt;/a&gt;&amp;nbsp;RoBERTa로&amp;nbsp;되어&amp;nbsp;있었다는걸&amp;nbsp;확인&amp;nbsp;했습니다.&lt;/span&gt; &lt;br /&gt;초반에 model_type이 아닌 path로 변경하는 걸 테스트 할 때 RoBERTa로 했었습니다. &lt;br /&gt;&lt;span style=&quot;color: #ee2323;&quot;&gt;이 config 파일을 XLMRobertaConfig로 변경하고 정상 수행 확인했습니다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;그래서 둘의 차이점을 확인해 보니 vocab_size가 달랐습니다.&amp;nbsp; 아마&amp;nbsp;Model에&amp;nbsp;input을&amp;nbsp;넣을때&amp;nbsp;사이즈에&amp;nbsp;문제가&amp;nbsp;있었던걸로&amp;nbsp;보입니다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;RobertaConfig - &quot;vocab_size&quot;: 50265 &lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;XLMRobertaConfig - &quot;vocab_size&quot;: 250002&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;추가로 &lt;span style=&quot;color: #333333;&quot;&gt;RobertaConfig와 &lt;span style=&quot;color: #000000;&quot;&gt;XLMRobertaConfig&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&lt;span style=&quot;color: #000000;&quot;&gt;의 내용을 올립니다.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;RobertaConfig&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 100%;&quot;&gt;{ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;architectures&quot;:&amp;nbsp;[ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;RobertaForMaskedLM&quot; &lt;br /&gt;&amp;nbsp;&amp;nbsp;], &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;attention_probs_dropout_prob&quot;:&amp;nbsp;0.1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;bos_token_id&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;do_sample&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;eos_token_ids&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;finetuning_task&quot;:&amp;nbsp;null, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;hidden_act&quot;:&amp;nbsp;&quot;gelu&quot;, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;hidden_dropout_prob&quot;:&amp;nbsp;0.1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;hidden_size&quot;:&amp;nbsp;768, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;id2label&quot;:&amp;nbsp;{ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;0&quot;:&amp;nbsp;&quot;LABEL_0&quot;, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;1&quot;:&amp;nbsp;&quot;LABEL_1&quot; &lt;br /&gt;&amp;nbsp;&amp;nbsp;}, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;initializer_range&quot;:&amp;nbsp;0.02, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;intermediate_size&quot;:&amp;nbsp;3072, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;is_decoder&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;label2id&quot;:&amp;nbsp;{ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;LABEL_0&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;LABEL_1&quot;:&amp;nbsp;1 &lt;br /&gt;&amp;nbsp;&amp;nbsp;}, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;layer_norm_eps&quot;:&amp;nbsp;1e-05, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;length_penalty&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;max_length&quot;:&amp;nbsp;20, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;max_position_embeddings&quot;:&amp;nbsp;514, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;model_type&quot;:&amp;nbsp;&quot;roberta&quot;, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_attention_heads&quot;:&amp;nbsp;12, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_beams&quot;:&amp;nbsp;1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_hidden_layers&quot;:&amp;nbsp;12, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_labels&quot;:&amp;nbsp;2, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_return_sequences&quot;:&amp;nbsp;1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;output_attentions&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;output_hidden_states&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;output_past&quot;:&amp;nbsp;true, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;pad_token_id&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;pruned_heads&quot;:&amp;nbsp;{}, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;repetition_penalty&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;temperature&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;top_k&quot;:&amp;nbsp;50, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;top_p&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;torchscript&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;type_vocab_size&quot;:&amp;nbsp;1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;use_bfloat16&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&lt;span style=&quot;color: #ee2323;&quot;&gt;&quot;vocab_size&quot;:&amp;nbsp;50265&lt;/span&gt; &lt;br /&gt;}&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;XLMRobertaConfig&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 100%;&quot;&gt;{ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;architectures&quot;:&amp;nbsp;[ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;XLMRobertaForMaskedLM&quot; &lt;br /&gt;&amp;nbsp;&amp;nbsp;], &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;attention_probs_dropout_prob&quot;:&amp;nbsp;0.1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;bos_token_id&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;do_sample&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;eos_token_ids&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;finetuning_task&quot;:&amp;nbsp;null, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;hidden_act&quot;:&amp;nbsp;&quot;gelu&quot;, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;hidden_dropout_prob&quot;:&amp;nbsp;0.1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;hidden_size&quot;:&amp;nbsp;768, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;id2label&quot;:&amp;nbsp;{ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;0&quot;:&amp;nbsp;&quot;LABEL_0&quot;, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;1&quot;:&amp;nbsp;&quot;LABEL_1&quot; &lt;br /&gt;&amp;nbsp;&amp;nbsp;}, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;initializer_range&quot;:&amp;nbsp;0.02, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;intermediate_size&quot;:&amp;nbsp;3072, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;is_decoder&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;label2id&quot;:&amp;nbsp;{ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;LABEL_0&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&quot;LABEL_1&quot;:&amp;nbsp;1 &lt;br /&gt;&amp;nbsp;&amp;nbsp;}, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;layer_norm_eps&quot;:&amp;nbsp;1e-05, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;length_penalty&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;max_length&quot;:&amp;nbsp;20, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;max_position_embeddings&quot;:&amp;nbsp;514, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;model_type&quot;:&amp;nbsp;&quot;xlm-roberta&quot;, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_attention_heads&quot;:&amp;nbsp;12, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_beams&quot;:&amp;nbsp;1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_hidden_layers&quot;:&amp;nbsp;12, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_labels&quot;:&amp;nbsp;2, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;num_return_sequences&quot;:&amp;nbsp;1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;output_attentions&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;output_hidden_states&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;output_past&quot;:&amp;nbsp;true, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;pad_token_id&quot;:&amp;nbsp;0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;pruned_heads&quot;:&amp;nbsp;{}, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;repetition_penalty&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;temperature&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;top_k&quot;:&amp;nbsp;50, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;top_p&quot;:&amp;nbsp;1.0, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;torchscript&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;type_vocab_size&quot;:&amp;nbsp;1, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;use_bfloat16&quot;:&amp;nbsp;false, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&quot;vocab_size&quot;:&amp;nbsp;250002 &lt;br /&gt;}&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #000000;&quot;&gt;글이&amp;nbsp;도움되셨다면&amp;nbsp;공감&amp;nbsp;부탁&amp;nbsp;드립니다. &lt;br /&gt;&lt;br /&gt;감사합니다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Error</category>
      <category>error</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/62</guid>
      <comments>https://ynebula.tistory.com/62#entry62comment</comments>
      <pubDate>Wed, 19 Feb 2020 21:13:10 +0900</pubDate>
    </item>
    <item>
      <title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - C Additional Ablation Studies</title>
      <link>https://ynebula.tistory.com/61</link>
      <description>&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;b&gt;&lt;b&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/b&gt;&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;A Additional Details for BERT&lt;/span&gt;&amp;nbsp;는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;a href=&quot;https://ynebula.tistory.com/60&quot;&gt;https://ynebula.tistory.com/60&lt;/a&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;C&amp;nbsp;Additional&amp;nbsp;Ablation&amp;nbsp;Studies&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/blKhKv/btqBVfm55Gr/z7raK3wC5dtT2QbPrWopqk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/blKhKv/btqBVfm55Gr/z7raK3wC5dtT2QbPrWopqk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/blKhKv/btqBVfm55Gr/z7raK3wC5dtT2QbPrWopqk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FblKhKv%2FbtqBVfm55Gr%2Fz7raK3wC5dtT2QbPrWopqk%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;&lt;b&gt;C.1&amp;nbsp;Effect&amp;nbsp;of&amp;nbsp;Number&amp;nbsp;of&amp;nbsp;Training&amp;nbsp;Steps&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;Figure 5은 checkpoint를 fine-tuning 후 MNLI Dev accuracy를 나타냅니다. checkpoint는&amp;nbsp;k&amp;nbsp;step번&amp;nbsp;pre-train되었습니다.&lt;br /&gt;이것은&amp;nbsp;다음과&amp;nbsp;같은&amp;nbsp;의문이&amp;nbsp;생깁니다.&lt;br /&gt;1.&amp;nbsp;Question&lt;br /&gt;BERT는&amp;nbsp;높은&amp;nbsp;fine-tunning&amp;nbsp;accuracy를&amp;nbsp;얻기&amp;nbsp;위해&amp;nbsp;pre-training으로&amp;nbsp;그렇게&amp;nbsp;큰&amp;nbsp;양(128,000&amp;nbsp;words/batch&amp;nbsp;*&amp;nbsp;1,000,000&amp;nbsp;steps)이&amp;nbsp;필요한가?&lt;br /&gt;Answer: Yes, BERTBASE는&amp;nbsp;MNLI에서&amp;nbsp;500k&amp;nbsp;steps에&amp;nbsp;비해&amp;nbsp;1M은&amp;nbsp;steps는&amp;nbsp;추가&amp;nbsp;accuracy&amp;nbsp;1.0%를&amp;nbsp;얻습니다(K=1000&amp;nbsp;/&amp;nbsp;M=&amp;nbsp;1,000,000).&lt;br /&gt;&lt;br /&gt;2.&amp;nbsp;Question&lt;br /&gt;batch마다&amp;nbsp;단어중&amp;nbsp;15%만&amp;nbsp;예측하기&amp;nbsp;때문에&amp;nbsp;MLM&amp;nbsp;pre-training&amp;nbsp;수렴은&amp;nbsp;LTR&amp;nbsp;pre-training보다&amp;nbsp;느립니다.&amp;nbsp;&lt;br /&gt;Answer: 시작하고&amp;nbsp;거의&amp;nbsp;바로&amp;nbsp;accuracy&amp;nbsp;측면에서&amp;nbsp;MLM모델은&amp;nbsp;LTR모델을&amp;nbsp;능가합니다.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;C.2&amp;nbsp;Ablation&amp;nbsp;for&amp;nbsp;Different&amp;nbsp;Masking&amp;nbsp;Procedures&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cBSzCk/btqBVNcQL4n/saZkZl8VF5dQNcSOAWQHr0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cBSzCk/btqBVNcQL4n/saZkZl8VF5dQNcSOAWQHr0/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cBSzCk/btqBVNcQL4n/saZkZl8VF5dQNcSOAWQHr0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcBSzCk%2FbtqBVNcQL4n%2FsaZkZl8VF5dQNcSOAWQHr0%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;Section 3.1에서, BERT는 masked language model(MLM)로 pre-training할 때 target token을 masking한픈 mixed strategy를 사용합니다. 다음은&amp;nbsp;여러&amp;nbsp;masking&amp;nbsp;strategies의&amp;nbsp;영향을&amp;nbsp;평가하는&amp;nbsp;ablation&amp;nbsp;연구입니다.&lt;br /&gt;&lt;br /&gt;masking strategies의 목표는 fine-tuning동안 [MASK] 심볼을 절대 나타나지 않게 함으로써 pre-training과 fine-tuning간 mismatch를 줄인는 것 입니다. 우리는 MNLI과 NER의 Dev 결과를 보고합니다. NER에서 우리는 fine-tuning과 feature-based 방법을 보고합니다. 예상한대로, model이 representation을 조정할 기회가 없어서 feature-based 방법은 mismatch가 증폭됩니다.&lt;/p&gt;
&lt;p&gt;결과는 Table 8에 확인할 수 있습니다. Table에서&amp;nbsp;MASK는&amp;nbsp;[MASK]심볼로&amp;nbsp;target&amp;nbsp;token을&amp;nbsp;대체합니다.&lt;br /&gt;SAME은 target token을 유지합니다. RND은&amp;nbsp;random&amp;nbsp;token으로&amp;nbsp;target&amp;nbsp;token을&amp;nbsp;대체합니다.&lt;br /&gt;table의 왼쪽의 수치는 MLM pre-training동안 사용된 확률을 나타냅니다(BERT: 80%, 10%, 10%). 오른쪽 부분은 Dev set 결과를 나타냅니다. featured-based 접근방ㅂ버을 위해, 우리는&amp;nbsp;BERT의&amp;nbsp;마지막&amp;nbsp;4layers에&amp;nbsp;결합했습니다(Section&amp;nbsp;5.3에서&amp;nbsp;best&amp;nbsp;접근방법으로&amp;nbsp;나왔던).&lt;br /&gt;table로 부터 fine-tuning은 여러 masking strategies에 강력합니다. 하지만 예상대로, feature-based 접근방법dmf NER에 적용할 때 Mask strategy만 사용하는것은 문제가 되었습니다. 흥미롭게도,&amp;nbsp;RND&amp;nbsp;strategy만&amp;nbsp;사용하는&amp;nbsp;것은&amp;nbsp;우리의&amp;nbsp;strategy보다&amp;nbsp;성능이&amp;nbsp;안좋습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;6. Conclude&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;최근&amp;nbsp;Language&amp;nbsp;Model에서&amp;nbsp;전이학습으로&amp;nbsp;인한&amp;nbsp;성능&amp;nbsp;향상은&amp;nbsp;충분한&amp;nbsp;Unsupervised&amp;nbsp;pre-training은&amp;nbsp;많은&amp;nbsp;Language&amp;nbsp;Understanding에&amp;nbsp;필수라고&amp;nbsp;설명되었습니다.&amp;nbsp;특히,&amp;nbsp;이&amp;nbsp;결과는&amp;nbsp;Deep&amp;nbsp;Unidirectional&amp;nbsp;Architectures로&amp;nbsp;인해&amp;nbsp;적은&amp;nbsp;자원으로도&amp;nbsp;가능했습니다.&amp;nbsp;우리는&amp;nbsp;동일하게&amp;nbsp;pre-training된&amp;nbsp;모델로&amp;nbsp;여러&amp;nbsp;NLP&amp;nbsp;작업을&amp;nbsp;성공적으로&amp;nbsp;처리&amp;nbsp;할&amp;nbsp;수&amp;nbsp;있도록&amp;nbsp;Deep&amp;nbsp;Bidirectional&amp;nbsp;Architectures&amp;nbsp;연구하여,&amp;nbsp;좀&amp;nbsp;더&amp;nbsp;일반화할&amp;nbsp;계획입니다.&lt;/p&gt;</description>
      <category>논문분석</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/61</guid>
      <comments>https://ynebula.tistory.com/61#entry61comment</comments>
      <pubDate>Tue, 11 Feb 2020 20:48:58 +0900</pubDate>
    </item>
    <item>
      <title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - Additional Details for BERT</title>
      <link>https://ynebula.tistory.com/60</link>
      <description>&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;b&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;b&gt;5 Ablation Studies&lt;/b&gt;는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;a href=&quot;https://ynebula.tistory.com/59&quot;&gt;https://ynebula.tistory.com/59&lt;/a&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;Appendix&amp;nbsp;for&amp;nbsp;&quot;BERT:&amp;nbsp;Pre-training&amp;nbsp;of&amp;nbsp;Deep&amp;nbsp;Bidirectional&amp;nbsp;Transformers&amp;nbsp;for&amp;nbsp;Language&amp;nbsp;Understanding&quot;&lt;br /&gt;세개의&amp;nbsp;섹션으로&amp;nbsp;구성하였습니다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;Appendix A. BERT&amp;nbsp;&amp;nbsp;추가 구현법을 작성함.&lt;/li&gt;
&lt;li&gt;Appendix B. 우리의 경험.&lt;/li&gt;
&lt;li&gt;Appendix C. ablation 연구&lt;br /&gt;&amp;nbsp;:&amp;nbsp;Training&amp;nbsp;Steps의&amp;nbsp;Number의&amp;nbsp;효과.&lt;br /&gt;&amp;nbsp;:&amp;nbsp;여러&amp;nbsp;Masking&amp;nbsp;Procedures&amp;nbsp;의&amp;nbsp;Ablation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;A Additional Details for BERT&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;A.1 Illustration of the Pre-training Tasks &lt;/b&gt;&lt;/p&gt;
&lt;p&gt;우리는&amp;nbsp;다음&amp;nbsp;pre-training&amp;nbsp;tasks의&amp;nbsp;예를&amp;nbsp;제공합니다.&lt;br /&gt;&lt;b&gt;Masked&amp;nbsp;LM&amp;nbsp;and&amp;nbsp;the&amp;nbsp;Masking&amp;nbsp;Procedure&lt;/b&gt;&lt;br /&gt;my dog is hairy라는 unlabed 문장을 가정해 보자. 그리고 random masking 절차 동안, hariy에 대응하는 4-th token을 선택합니다. masking&amp;nbsp;절차는&amp;nbsp;다음과&amp;nbsp;같이&amp;nbsp;설명할&amp;nbsp;수&amp;nbsp;있습니다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;80% of the time: [MASK] token으로 단어를 대체&lt;/span&gt;:&amp;nbsp;e.g.,&amp;nbsp;my&amp;nbsp;dog&amp;nbsp;is&amp;nbsp;hairy&amp;nbsp;-&amp;gt;&amp;nbsp;my&amp;nbsp;dog&amp;nbsp;is&amp;nbsp;[MASK]&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;10% of the time: random word로 대체:&lt;/span&gt;&amp;nbsp;e.g.,&amp;nbsp;my&amp;nbsp;dog&amp;nbsp;is&amp;nbsp;hairy&amp;nbsp;-&amp;gt;&amp;nbsp;my&amp;nbsp;dog&amp;nbsp;is&amp;nbsp;apple&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;10% of the time: 변경하지 않음&lt;/span&gt;:&amp;nbsp;e.g.,&amp;nbsp;my&amp;nbsp;dog&amp;nbsp;is&amp;nbsp;hairy&amp;nbsp;-&amp;gt;&amp;nbsp;my&amp;nbsp;dog&amp;nbsp;is&amp;nbsp;hairy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;이 procedure의 이점은 Transformer encoder는 어떤게 random word로 대체되었는지 알지 못합니다. 그래서 모든 input token distributional contextual representation을 유지시킵니다. 게다가,&amp;nbsp;random&amp;nbsp;replacement는&amp;nbsp;모든&amp;nbsp;token에서&amp;nbsp;겨우&amp;nbsp;1.5%로&amp;nbsp;발생합니다.&amp;nbsp;이건&amp;nbsp;model's&amp;nbsp;language&amp;nbsp;understanding&amp;nbsp;capability에&amp;nbsp;나쁜&amp;nbsp;영향을&amp;nbsp;주지&amp;nbsp;않을걸로&amp;nbsp;보입니다.&lt;br /&gt;Section C.2어서, 이 procedure의 영햐을 평가합니다. 표준 language model training과 비교하여, masked LM은 Batch 마다 15%만 예측합니다. 모델에 적용하려면 더 많은 pre-training 단계가 필요할 수 있습니다. Section&amp;nbsp;C.1에서&amp;nbsp;우리는&amp;nbsp;MLM이&amp;nbsp;left-to-right&amp;nbsp;model보다&amp;nbsp;약간&amp;nbsp;느리게&amp;nbsp;수렴되는걸&amp;nbsp;보여줍니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Next&amp;nbsp;Sentence&amp;nbsp;Prediction&lt;/b&gt;&lt;br /&gt;next&amp;nbsp;sentence&amp;nbsp;prediction&amp;nbsp;task는&amp;nbsp;다음&amp;nbsp;예로&amp;nbsp;설명할&amp;nbsp;수&amp;nbsp;있습니다.&lt;/p&gt;
&lt;table style=&quot;border-collapse: collapse; width: 100%;&quot; border=&quot;1&quot;&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;width: 100%;&quot;&gt;&lt;span style=&quot;color: #333333;&quot;&gt;Input = [CLS] the man went to [MASK] store [SEP] he&amp;nbsp;bought&amp;nbsp;a&amp;nbsp;gallon&amp;nbsp;[MASK]&amp;nbsp;milk&amp;nbsp;[SEP]&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #333333;&quot;&gt;Label&amp;nbsp;=&amp;nbsp;IsNext&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #333333;&quot;&gt;Input = [CLS] the man [MASK] to the store [SEP] penguin&amp;nbsp;[MASK]&amp;nbsp;are&amp;nbsp;flight&amp;nbsp;##less&amp;nbsp;birds&amp;nbsp;[SEP]&lt;/span&gt;&lt;br /&gt;&lt;span style=&quot;color: #333333;&quot;&gt;Label&amp;nbsp;=&amp;nbsp;NotNext&lt;/span&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;b&gt;A.2&amp;nbsp;Pre-training&amp;nbsp;Procedure&lt;/b&gt;&lt;br /&gt;trainin input sequence를 만들기 위해, 우리는 corpus에서 two spans를 샘플링 했습니다. 첫 sentence A embedding과 두 번째&amp;nbsp;&amp;nbsp;sentence B embedding. B의 50%는 A의 실제 다음 문장이고 50%sms random sentence입니다. 결합된 길이는 512token보다 작게 샘플링 했습니다. 15%&amp;nbsp;비율로&amp;nbsp;masking된&amp;nbsp;WordPiece&amp;nbsp;tokenization&amp;nbsp;후&amp;nbsp;LM&amp;nbsp;masking이&amp;nbsp;적용됩니다.&lt;br /&gt;그리고 partial word piece에 특별한 고려는 없습니다. 1,000,000 steps 동안 batch size 256 sequences (256 sequences * 512 tokens = 128,000 tokens/batch) 3.3 billion word corpus로 약 40 epoch 훈련을 했습니다. learning rage 1e-4로 Adam과 B1 = 0.9, B2=0.999, L2 weight decay=0.01, 첫 10,000 steps동안&amp;nbsp;&amp;nbsp;learning rage warmup과 learning rate의 linear decay 사용했습니다. 모든 layer의 dropout의 확률은 0.1입니다.&amp;nbsp; OpenAI GPT에서 사용한 relu보다 gelu activation을 사용했습니다. training&amp;nbsp;loss는&amp;nbsp;mean&amp;nbsp;masked&amp;nbsp;LM&amp;nbsp;likelihood와&amp;nbsp;mean&amp;nbsp;next&amp;nbsp;sentence&amp;nbsp;prediction&amp;nbsp;likelihood의&amp;nbsp;sum&amp;nbsp;입니다.&lt;/p&gt;
&lt;p&gt;BERTBASE Training은 Pod configuration에서 4 Cloud TPU로 수행했습니다(16 TPU chips totals). BERTLARGE Training은 16 Cloud TPUs에서 수행했습니다(64 TPU chips totals). 각 pre-training이 완료되는데는 4일 걸렸습니다. attention은 sentence 길이의 quadratic(이차의) 이므로 긴 문장은 더 많은 시간이 소요됩니다. pre-training의 속도를 올리기 위해서, 우리는 corpus의 90%를 128길이로 pre-train 했습니다. 그런&amp;nbsp;다음,&amp;nbsp;positional&amp;nbsp;embeddings을&amp;nbsp;학습하기&amp;nbsp;위해&amp;nbsp;나머지&amp;nbsp;10%를&amp;nbsp;512길이로&amp;nbsp;학습했습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;A.3&amp;nbsp;Fine-tunning&amp;nbsp;Produre&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;fine-tunning 동안, 대부분 model hyperparameters는 pre-training과 같습니다(batch size, learning rate, epoch 수 예외).&lt;br /&gt;dropout의 확률은 항상 0.1로 유자합니다. 최상의&amp;nbsp;hyperparameter값은&amp;nbsp;task-specifc이지만,&amp;nbsp;우리는&amp;nbsp;다음과&amp;nbsp;같이&amp;nbsp;모든&amp;nbsp;task에&amp;nbsp;잘&amp;nbsp;작동하는&amp;nbsp;value의&amp;nbsp;범위를&amp;nbsp;발견했습니다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Batch size: 16, 32&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Learning rate(Adam): 5e-e, 3e-5, 2e-5&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;Number of epochs: 2, 4&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;우리는 또한 관찰했습니다. large data sets(e.g., 100k+ labeled training examples)은 hyperparameter 선택에 덜 민감합니다. Fine-tunning은&amp;nbsp;대체로&amp;nbsp;빠릅니다.&amp;nbsp;그래서&amp;nbsp;위&amp;nbsp;parameters를&amp;nbsp;모두&amp;nbsp;수행하고&amp;nbsp;개발셋에서&amp;nbsp;가장&amp;nbsp;좋은&amp;nbsp;Model을&amp;nbsp;선택하는것에&amp;nbsp;합리적입니다.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A.4&amp;nbsp;Comparison&amp;nbsp;of&amp;nbsp;BERT,&amp;nbsp;ELMo&amp;nbsp;and&amp;nbsp;OpenAI&amp;nbsp;GPT&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bVGEu5/btqBS56TC1z/UdeQIHNv1O5EZkS6tEI37k/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bVGEu5/btqBS56TC1z/UdeQIHNv1O5EZkS6tEI37k/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bVGEu5/btqBS56TC1z/UdeQIHNv1O5EZkS6tEI37k/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbVGEu5%2FbtqBS56TC1z%2FUdeQIHNv1O5EZkS6tEI37k%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;최근 인기있는 representation learning models를 연구합니다(ELMo, OpenAI GPT, BERT). Figure 3에서 model architecture간 비교를 보여줍니다. BERT와 OpenAI GPT는 fine-tuning 방식입니다. 반면 ELMo는 feature-based 방식입니다. BERT와 가장 유사한 기존 pre-training 방법은&amp;nbsp;&amp;nbsp;OpenAI GPT입니다. OpenAI GPT는 large text corpus에서 left-to-right Transformer LM으로 훈련합니다. ERT의 많은 설계는 두 방법이 비교할 수 있게 GPT와 유사하게 만들어 졌습니다.&lt;br /&gt;이&amp;nbsp;작업의&amp;nbsp;core&amp;nbsp;argument는&amp;nbsp;bi-directionality와&amp;nbsp;two&amp;nbsp;pre-training&amp;nbsp;tasks&lt;br /&gt;그리고&amp;nbsp;Section&amp;nbsp;3.1에서&amp;nbsp;보여주는&amp;nbsp;두&amp;nbsp;pre-training&amp;nbsp;tasks는&amp;nbsp;empirical&amp;nbsp;improvements의&amp;nbsp;대부분을&amp;nbsp;설명합니다.&lt;br /&gt;그러나&amp;nbsp;BERT와&amp;nbsp;GPT&amp;nbsp;train&amp;nbsp;방법&amp;nbsp;차이는&amp;nbsp;몇가지만&amp;nbsp;있습니다.&lt;/p&gt;
&lt;ul style=&quot;list-style-type: disc;&quot; data-ke-list-type=&quot;disc&quot;&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;GPT는 BooksCorpus(800M words)로 훈련하였습니다. &lt;/span&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;BERT는 BooksCorpus(800M words )와 Wikipedia(2,500M words)로 훈련하였습니다.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;GPT는 오직 fine-tuning에서만 문자 구분자(sentence separator)로 ([SEP])와 classifier token으로 ([CLS]) 를 사용하였습니다. &lt;/span&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;BERT는 pre-training동안 [SEP], [CLS]와 sentence A/B embedding을 학습합니다.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;GPT 32,000단어의 batch size를 1M step동안 학습합니다. &lt;/span&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;BERT는&amp;nbsp;128,000단어의&amp;nbsp;batch&amp;nbsp;size를&amp;nbsp;1M&amp;nbsp;step동안&amp;nbsp;학습합니다.&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;GPT는 fine-tuning동안 같은 learning rage를 사용합니다(5e-5). &lt;/span&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;BERT는&amp;nbsp;development&amp;nbsp;set에서&amp;nbsp;가장&amp;nbsp;좋은&amp;nbsp;성능을&amp;nbsp;나타내는&amp;nbsp;learning&amp;nbsp;rate를&amp;nbsp;선택합니다(task-specific&amp;nbsp;fine-tuning).&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;이런 다른점의 효과를 구분하기 위해, 우리는 Section 5.1에서 ablation 실험을 수행했습니다. 성능향상의&amp;nbsp;대부분은&amp;nbsp;pre-training과&amp;nbsp;bidirectionality에서&amp;nbsp;비롯되었다고&amp;nbsp;설명합니다.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;b&gt;A.5&amp;nbsp;Illustration&amp;nbsp;of&amp;nbsp;Fine-tuning&amp;nbsp;on&amp;nbsp;Different&amp;nbsp;Tasks&lt;/b&gt;&lt;br /&gt;여러 tasks에서 BERT의 fine-tuning의 설명은 Figure4에서 보여준다. 우리의 task-specific model들은 통합 BERT에 하나의 추가 output layer를 추가해서 만들었습니다. 그래서 최소한의 parameters를 처음부터 배워야 합니다. tasks중 (a)와 (b)는 sequence-level task입니다. 반면, (c)와 (d)는 token-level tasks입니다. 그림에서 E는 input embedding을 나타냅니다. Ti token에서 i번째 tokendml contextual representation 나타냅니다. [CLS]은 output을 구분하기 위한 특수문자입니다. 그리고&amp;nbsp;[SEP]는&amp;nbsp;non-consecutive&amp;nbsp;token&amp;nbsp;sequences을&amp;nbsp;분리하는&amp;nbsp;특수&amp;nbsp;문자입니다.&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;b&gt;C&amp;nbsp;Additional&amp;nbsp;Ablation&amp;nbsp;Studies&lt;/b&gt;&amp;nbsp;는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/61&quot;&gt;https://ynebula.tistory.com/61&lt;/a&gt;&lt;/p&gt;</description>
      <category>논문분석</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/60</guid>
      <comments>https://ynebula.tistory.com/60#entry60comment</comments>
      <pubDate>Tue, 11 Feb 2020 20:32:36 +0900</pubDate>
    </item>
    <item>
      <title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 5</title>
      <link>https://ynebula.tistory.com/59</link>
      <description>&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;b&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;4 Experiments는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;a href=&quot;https://ynebula.tistory.com/58&quot;&gt;https://ynebula.tistory.com/58&lt;/a&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;5 Ablation Studies&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;이번 섹션에서 우리는 상대적인 중요성을 좀 더 이해하기 위해서 BERT를 다양한 측면에서 ablation experiments를 수행하였습니다. 추가적인 ablation 연구는 부록 C에서 확인할 수 있습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;5.1 Effect of Pre-training Tasks&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/ylgls/btqBD1pT6m4/EnltdX7hhPtn5YGBReoYs0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/ylgls/btqBD1pT6m4/EnltdX7hhPtn5YGBReoYs0/img.png&quot; data-alt=&quot;Table&amp;amp;amp;nbsp;5&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/ylgls/btqBD1pT6m4/EnltdX7hhPtn5YGBReoYs0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fylgls%2FbtqBD1pT6m4%2FEnltdX7hhPtn5YGBReoYs0%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Table&amp;nbsp;5&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;우리는 같은 pre-training data와 fine-tuning scheme 그리고 hyperparamters를 사용하는 두 pre-training objectives를 평가하여 DBER의 deep bidirectionality 중요성을 설명합니다(as BERT&lt;sub&gt;BASE&lt;/sub&gt;).&lt;/p&gt;
&lt;p&gt;&lt;b&gt;No NSP&lt;/b&gt;: MLM을 사용하여 훈련된 bidirectional model 그러나 NSP는 사용하지 않음.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;LTR(&lt;span style=&quot;color: #333333;&quot;&gt;Left-to-Right - &lt;span style=&quot;color: #333333;&quot;&gt;LTR을 사용하여 훈련된 left-context-only model&lt;/span&gt;)&lt;/span&gt; &amp;amp; NSP:&lt;/b&gt; left-only 제약은 fine-tuning에도 적용됩니다(pre-train/fine-tune mismatch는 downstream 성능을 저하시키므로). 추가로, 이 모델은 NSP task없이 pre-train됩니다. OpenAI GPT와 비교하여, 더 큰 dataset과 input representation, fine-tuning schem을 사용합니다. 우리는&amp;nbsp;NSP의&amp;nbsp;영향을&amp;nbsp;시험했습니다.&amp;nbsp;Table&amp;nbsp;5를&amp;nbsp;보면,&amp;nbsp;NSP&amp;nbsp;제거는&amp;nbsp;QNLI,&amp;nbsp;MNLI,&amp;nbsp;SQuAD&amp;nbsp;1.1에&amp;nbsp;상당히&amp;nbsp;나쁜&amp;nbsp;영향을&amp;nbsp;주었습니다.&lt;br /&gt;다음으로, bidirectional representations을 훈현하는 영향을 평가했습니다(No NSP, LTR &amp;amp; No NSP). LTR&amp;nbsp;모델은&amp;nbsp;모든&amp;nbsp;task에서&amp;nbsp;MLM&amp;nbsp;모델보다&amp;nbsp;더&amp;nbsp;나쁜&amp;nbsp;성능을&amp;nbsp;보였습니다.&amp;nbsp;특히,&amp;nbsp;MRPC와&amp;nbsp;SQuAD에서&amp;nbsp;많이&amp;nbsp;떨어졌습니다.&lt;/p&gt;
&lt;p&gt;SQuAD에서 LTR 모델은 token prediction에서 나쁜 성능을 낼거란걸 직관적으로 예상합니다(token-level hidden states는 right-side context가 없기 때문). LTR 시스템을 개선하기 위해, 랜덤하게 초기화된 BiLSTM을 top에 추가하였습니다. 이것은 SQuAD에 상당한 성능 향상을 보였습니다. 그러나 여전히 pre-train된 bidirectional model 보다는 좋지 않았습니다. BiLSTM은 GLUE task에서는 나쁜 성능을 냈습니다.&lt;/p&gt;
&lt;p&gt;우리는 LTR과 RTL모델을 분리하여 훈련하는 것과 ELMo와&amp;nbsp;같이&amp;nbsp;두&amp;nbsp;model을&amp;nbsp;합쳐서&amp;nbsp;토큰을&amp;nbsp;표현할&amp;nbsp;수&amp;nbsp;있다는걸&amp;nbsp;알게되었습니다.&lt;br /&gt;However:&amp;nbsp;&lt;br /&gt;(a)&amp;nbsp;single&amp;nbsp;bidirectional&amp;nbsp;model은&amp;nbsp;두&amp;nbsp;배의&amp;nbsp;비용이&amp;nbsp;들어감&lt;br /&gt;(b)&amp;nbsp;QA와&amp;nbsp;같은&amp;nbsp;task에는&amp;nbsp;non-intuitive함(RTL&amp;nbsp;model은&amp;nbsp;질문에&amp;nbsp;답을&amp;nbsp;할&amp;nbsp;수&amp;nbsp;없기때문)&lt;br /&gt;(c)&amp;nbsp;deep&amp;nbsp;bidirectional&amp;nbsp;모델&amp;nbsp;보다는&amp;nbsp;덜&amp;nbsp;효과적&amp;nbsp;임(모든&amp;nbsp;layer에서&amp;nbsp;양&amp;nbsp;방향(left&amp;nbsp;and&amp;nbsp;right)&amp;nbsp;context를&amp;nbsp;사용하기&amp;nbsp;때문).&lt;/p&gt;
&lt;p&gt;&lt;b&gt;5.2&amp;nbsp;Effect&amp;nbsp;of&amp;nbsp;Model&amp;nbsp;Size&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/TEkpr/btqBUAkWY5h/GKb0fIFnbM47FHMTy5QLU1/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/TEkpr/btqBUAkWY5h/GKb0fIFnbM47FHMTy5QLU1/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/TEkpr/btqBUAkWY5h/GKb0fIFnbM47FHMTy5QLU1/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FTEkpr%2FbtqBUAkWY5h%2FGKb0fIFnbM47FHMTy5QLU1%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;이번 섹션에서, model size가 fine-tuning accuracy에 미치는 영향을 확인 하겠습니다. 우리는&amp;nbsp;같은&amp;nbsp;hyperparameters와&amp;nbsp;training절차로&amp;nbsp;다른&amp;nbsp;수의&amp;nbsp;layer,&amp;nbsp;hidden&amp;nbsp;unit,&amp;nbsp;attention&amp;nbsp;head로&amp;nbsp;여러&amp;nbsp;BERT모델을&amp;nbsp;학습했습니다.&lt;br /&gt;GLUE task의 결과는 Table 6에서 확인할 수 있습니다. Table 6에서, 우리는 5random restarts fine-tunning의 평균 DEV Set accuracy를 리포트 했습니다. 모든 dataset에서 큰 모델이 accuracy 향상을 확인했습니다. pre-training task는 다르고, 심지어 MRPC 3,600 labeled traing exmaple를 가진 MRPC에서도 더 좋은 성능을 보였습니다. 기존&amp;nbsp;leterature에&amp;nbsp;비해&amp;nbsp;이미&amp;nbsp;상당히&amp;nbsp;큰&amp;nbsp;모델을&amp;nbsp;통해&amp;nbsp;상당한&amp;nbsp;성능&amp;nbsp;향상을&amp;nbsp;얻을&amp;nbsp;수&amp;nbsp;있다는&amp;nbsp;것은&amp;nbsp;놀라운&amp;nbsp;일입니다.&lt;br /&gt;예를들어,&amp;nbsp;가장&amp;nbsp;큰&amp;nbsp;Transformer의&amp;nbsp;encoder는&amp;nbsp;(L=6,&amp;nbsp;H=1024,&amp;nbsp;A=16)와&amp;nbsp;100M&amp;nbsp;paramters로&amp;nbsp;연구하였습니다.&lt;br /&gt;그리고&amp;nbsp;우리가&amp;nbsp;가진&amp;nbsp;&amp;nbsp;가장큰&amp;nbsp;Transformer는&amp;nbsp;......&lt;br /&gt;대조적으로,&amp;nbsp;BERTBASE는&amp;nbsp;110M&amp;nbsp;parameters와&amp;nbsp;BERTLARGE는&amp;nbsp;340M&amp;nbsp;parameters를&amp;nbsp;가지고&amp;nbsp;있습니다.&lt;br /&gt;&lt;br /&gt;Table 6에서 보여준, LM perplexity of held-out training data 설명되었던 machine trranslation과 language modeling과 같은 large-scale task에서 model size크기를 증가시키면 연속적인 성능향상을 보인다는 것은 오랜 시간동안 알려져 왔습니다. 하지만, 우리는 충분히 pre-train된 모델이라면, 매우 작은 규모의 작업에서도 크게 개선된다는 것을 보여주는 첫 번째 작업이라는 것을 믿습니다. Peters et al. (2018b) presented mixed results on the downstream task impact of increasing the pre-trained bi-LM size from two to four layers and &lt;span style=&quot;letter-spacing: 0px;&quot;&gt;그리고 Melamud et al. hidden dimension size를 200에서 600으로 늘려서 도움이 됐다고 말했습니다. 그러나 1000 이상은 성능향상이 없다고 말했습니다. &lt;/span&gt;두&amp;nbsp;개&amp;nbsp;모두&amp;nbsp;feature-based&amp;nbsp;방법을&amp;nbsp;사용했습니다.&lt;br /&gt;we hypothesize that when the model is fine-tuned directly on the downstream tasks and uses only a very small number of randomly initialized additional parameters, the taskspecific models can benefit from the larger, more expressive pre-trained representations even when downstream task data is very small.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;A Additional Details for BERT&lt;/span&gt;&amp;nbsp;는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/60&quot;&gt;https://ynebula.tistory.com/60&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Artificial Intelligence</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/59</guid>
      <comments>https://ynebula.tistory.com/59#entry59comment</comments>
      <pubDate>Sat, 1 Feb 2020 13:08:40 +0900</pubDate>
    </item>
    <item>
      <title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 4</title>
      <link>https://ynebula.tistory.com/58</link>
      <description>&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;3 BERT는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;a href=&quot;https://ynebula.tistory.com/56&quot;&gt;https://ynebula.tistory.com/55&lt;/a&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1580477595198&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-og-type=&quot;article&quot; data-og-title=&quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3.1-3.2&quot; data-og-description=&quot;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3.1 Pre-training BERT Peter et al(2018a), Radford et al(2018)과 다르게, 우리는 BERT를 pre-train하기 위해 traditional left-to-right or right-to-left lan..&quot; data-og-host=&quot;ynebula.tistory.com&quot; data-og-source-url=&quot;https://ynebula.tistory.com/56&quot; data-og-url=&quot;https://ynebula.tistory.com/56&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/qqtCl/hyELTf5cZf/0kYtbuecJZgKWOf5pF4JN0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/h6v8X/hyELWcOruj/J0NTVJgkjGNBUi1kEKxhhk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/56&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://ynebula.tistory.com/56&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/qqtCl/hyELTf5cZf/0kYtbuecJZgKWOf5pF4JN0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/h6v8X/hyELWcOruj/J0NTVJgkjGNBUi1kEKxhhk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3.1-3.2&lt;/p&gt;
&lt;p class=&quot;og-desc&quot;&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3.1 Pre-training BERT Peter et al(2018a), Radford et al(2018)과 다르게, 우리는 BERT를 pre-train하기 위해 traditional left-to-right or right-to-left lan..&lt;/p&gt;
&lt;p class=&quot;og-host&quot;&gt;ynebula.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;&lt;b&gt;4 Experiments&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;이번 Section에서는 11 NLP 과제에서 BERT fine-tuning한 결과를 알아 보겠습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;4.1 GLUE&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/crD40j/btqBCT6S9nx/MDmbEwZai1t66Om3WDdlrK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/crD40j/btqBCT6S9nx/MDmbEwZai1t66Om3WDdlrK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/crD40j/btqBCT6S9nx/MDmbEwZai1t66Om3WDdlrK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcrD40j%2FbtqBCT6S9nx%2FMDmbEwZai1t66Om3WDdlrK%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;General Lanuage Understanding Evaluation(GLUE) benchmar(Wang et al., 2018a)은 다양한 자연어 이해 과제입니다. 자세한 설명은 부록 B.1에 있습니다.&lt;/p&gt;
&lt;p&gt;GLUE를 fine-tune하기 위해서, input sequnce를 Section 3에서 설명 했듯이 표현했습니다(single sentence or sentence pairs). 그리고 final hidden vector C를 사용했습니다(&lt;i&gt;C&lt;/i&gt;&lt;span&gt;&amp;isin;&lt;/span&gt;&lt;b&gt;R&lt;/b&gt;&lt;span&gt;&lt;sup&gt;&lt;i&gt;H&lt;/i&gt;&lt;/sup&gt;&lt;/span&gt;). C는 집계 표현으로 첫번째 token ([CLS]) 입니다.&amp;nbsp; Fine-tuning동안 새롭게 사용된 parameter는 classification layer weight W 입니다(&lt;i&gt;&lt;span style=&quot;color: #333333;&quot;&gt;W&lt;/span&gt;&lt;/i&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&amp;isin;&lt;/span&gt;&lt;b&gt;R&lt;/b&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&lt;sup&gt;&lt;i&gt;K*H&lt;/i&gt;&lt;/sup&gt;). &lt;i&gt;K&lt;/i&gt;는 label입니다. 우리는 &lt;i&gt;C&lt;/i&gt;와 &lt;i&gt;W&lt;/i&gt;로 classification loss를 연산하였습니다(i.e. log(softmax(&lt;i&gt;CW&lt;/i&gt;&lt;sup&gt;T&lt;/sup&gt;))).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #333333;&quot;&gt;우리는 모든 task에서 batch size:32, fine-tune:3 epoch로 설정했습니다. 각 task에서 우리는 Dev set에서 learning rate를 5e-5, 4e-5, 3e-5, 2e-5 중 가장 좋은 fine-tune을 선택했습니다. 게다가 BERTLARGE에서 우리는 fine-tuning이 작은 datasets에서 가끔 불안정 하다는 것을 확인했습니다. 그래서 우리는 무작위로 몇 번 재시작을 수행했고 best model을 선택하였습니다. 우리는 같은 pre-trained checkpoint를 사용했지만 data shuffling과 classifier layer initialization으로 여러번 fine-tuning을 수행했습니다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #333333;&quot;&gt;결과는 Table 1에서 확인할 수 있습니다. BERTBASE와 BERTLARGE 모두 상당한 차이로 더 좋은 결과를 냈습니다. 이전 SOTA보다 평균 accuracty가 4.5% ~ 7% 향상되었습니다. BERTBASE와 OpenAI GPT는 attention masking을 제외하고 거의 같은 model architecture 조건입니다. MNLI에서 BERT가 4.6% 더 높은 정확도를 보였습니다. GLUE leaderboard에서 BERTLARGE는 80.5 점을 받았습니다(OpenAI GPT: 72.8점).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style=&quot;color: #333333;&quot;&gt;우리는 매우 적은 training data로 모든 task에서 BERTLARGE와 BERTBASE 모두 높은 성능을 확인했습니다. model size의 효과는 Section 5.2에서 확인할 수 있습니다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;4.2 SQuAD v1.1&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; width=&quot;520&quot; height=&quot;529&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/bZR9IC/btqBE0DHTVw/6sIhZmkEw03milOJ1O63dK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/bZR9IC/btqBE0DHTVw/6sIhZmkEw03milOJ1O63dK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/bZR9IC/btqBE0DHTVw/6sIhZmkEw03milOJ1O63dK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FbZR9IC%2FbtqBE0DHTVw%2F6sIhZmkEw03milOJ1O63dK%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; width=&quot;520&quot; height=&quot;529&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Stanford Question Answering Dataset(SQuAD v1.1)은 100.000개 crowdsource 한 질문/답변 쌍 컬렉션입니다. 주어진 질문과 답변을 포한하는 위키피디아의 구절을 이용해서, Task는 구절에 있는 answer text span을 예측합니다. Figure 1에서 보듯이, 질문 답변 task에서 우리는 single packed sequence로 input question과 passage를 표현했습니다. 즉, A embedding을&amp;nbsp; 사용하여 질문으로, 그리고 B embedding을 사용하여 구절로 사용했습니다. 우리는 fine-tuning동안 start vector S와 end vector를 사용했습니다(&lt;i&gt;S&lt;/i&gt;&lt;span&gt;&amp;isin;&lt;/span&gt;&lt;b&gt;R&lt;/b&gt;&lt;span&gt;&lt;sup&gt;&lt;i&gt;H&lt;/i&gt;&lt;/sup&gt;&lt;/span&gt;, &lt;i&gt;E&lt;/i&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&amp;isin;&lt;/span&gt;&lt;b&gt;R&lt;/b&gt;&lt;span style=&quot;color: #333333;&quot;&gt;&lt;i&gt;H&lt;/i&gt;&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;)&lt;/span&gt;. answer span의 시작 work &lt;i&gt;i&lt;/i&gt;의 확률은 &lt;i&gt;T&lt;span&gt;i와 S사이를 dot product로&lt;/span&gt;&lt;/i&gt; 연산합니다. 다음에 단락의 모든 단어를 softax연산합니다.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;answer span의 end에도 유사식이 사용됩니다. from position i에서 to position의 candidate span의 점수는 &lt;i&gt;S&lt;/i&gt;&lt;b&gt;&amp;middot;&lt;/b&gt;&lt;i&gt;T&lt;sub&gt;i &lt;/sub&gt;&lt;/i&gt;+ &lt;i&gt;E&lt;/i&gt;&lt;b&gt;&amp;middot;&lt;/b&gt;&lt;i&gt;T&lt;sub&gt;j&lt;/sub&gt;&lt;/i&gt; 로정의됩니다. 그리고 maximum scoring span이 예측으로 사용됩니다 (&lt;i&gt;j&amp;gt;=i&lt;/i&gt;). 훈련의 목표는 올바를 start와 end의 log-likelihoods의 합 입니다. 우리는 3 epoch, learning rate: 5e-5, batch size:32로 fine-tune 하였습니다.&lt;/p&gt;
&lt;p&gt;Table 2에서 top published systems(Seo et al., 2017; Clark and Gardner. 2018; Peters et al., 2018a; Hu et al., 2018) 와 top leaderboard entiris를 확인할 수 있습니다. SQuAD leaderboard의 top 결과는 최신 public system descriptions를 갖지 못 했습니다. 그래서 &lt;span style=&quot;color: #333333;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;우리는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;SQuAD를 fine-tuing하기 전에,TriviaQA로 처음 fine-tuning하여 약간의 data argumentation으로 사용했습니다.&lt;/p&gt;
&lt;p&gt;우리의 best performing system은 ensembling에서&lt;span style=&quot;color: #333333;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;top leaderboard system보다&lt;/span&gt; +1.5 F1 성능을 냈습니다. 그리고 single system에서는 +1.3 F1 성능을 냈습니다. 사실상 single BERT model은 F1 score의 top ensemble system 성능을 냈습니다. TriviaQA fine-tuing없이는 0.1-0.4 F1 낮았습니다. 예전 system보다는 좋은 성능을 냈습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;4.3 SQuAD v2.0&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; width=&quot;526&quot; height=&quot;413&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cu3gqc/btqBE0wWWji/8HK4g35qvGj3Kd2QTmUbdk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cu3gqc/btqBE0wWWji/8HK4g35qvGj3Kd2QTmUbdk/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cu3gqc/btqBE0wWWji/8HK4g35qvGj3Kd2QTmUbdk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fcu3gqc%2FbtqBE0wWWji%2F8HK4g35qvGj3Kd2QTmUbdk%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; width=&quot;526&quot; height=&quot;413&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;SQuAD v2.0과제는 좀 더 현실적으로 만들고 짧은 답변이 &lt;span style=&quot;color: #333333;&quot;&gt;제시된 단락에&lt;span&gt; 없다는&lt;/span&gt;&lt;/span&gt; 가능성&lt;span style=&quot;color: #333333;&quot;&gt;을 허용하므로써 &lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;SQuAD v1.1문제 정의를 &lt;/span&gt;확장하였습니다. 우리는 이 과제를 위해 SQuAD v1.1 BERT 모델을 확장 시켰습니다. 우리는 답이 없는 질문을 [CLS] token에서 시작부터 끝의 답변 범위를 가지는 것으로 처리했습니다. [CLS] token의 위치를 포함하기 위해 시작과 종료 답변 범위 위치의 확률 공간을 확장하였습니다. 예측을 위해, 우리는 답변이 없는 범위의 점수를 계산합니다(&lt;i&gt;s&lt;/i&gt;&lt;sub&gt;null&lt;/sub&gt;=&lt;i&gt;S&lt;b&gt;&amp;middot;&lt;/b&gt;C&lt;/i&gt;&lt;span style=&quot;color: #333333;&quot;&gt;+&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;E&lt;b&gt;&amp;middot;&lt;/b&gt;C&lt;/i&gt;). non-null 범위의 점수는 &lt;i&gt;s&lt;sub&gt;i,j&lt;/sub&gt;&lt;/i&gt;=maxj&amp;gt;=.&lt;i&gt;S&lt;/i&gt;&lt;b&gt;&amp;middot;&lt;/b&gt;&lt;i&gt;Ti&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;span style=&quot;color: #333333;&quot;&gt;+&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;&lt;i&gt;E&lt;/i&gt;&lt;b&gt;&amp;middot;&lt;/b&gt;&lt;i&gt;Tj &lt;/i&gt;입니다(이 결과는 추정치 임). 우리는 non-null 답변을 예측하였습니다(&lt;i&gt;s&lt;sub&gt;i,j&lt;/sub&gt; (햇-추정치)&amp;gt; &lt;/i&gt;&lt;i&gt;s&lt;/i&gt;&lt;sub&gt;null &lt;/sub&gt;+ &amp;tau;) (&lt;span style=&quot;color: #333333;&quot;&gt;&amp;tau;(threshold)은 maximize F1의 dev set에서 선택됩니다)&lt;/span&gt;. 우리는 TriviaQA data를 사용하지 않았습니다. 우리는 2epochs, learing rate 5e-5, batch size: 48로 fine-tuen 하였습니다.&lt;/p&gt;
&lt;p&gt;이번 leaderboard entry들과 top publiahed work와 비교한 결과는 Table 3에서 볼 수 있습니다. 우리는 이전 best system 보다 +5.1 F1 향상을 확인했습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;4.4 SWAG&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/cmljUQ/btqBFwoSDV7/ULIzMkrIID8ANQixLJXhVK/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/cmljUQ/btqBFwoSDV7/ULIzMkrIID8ANQixLJXhVK/img.png&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/cmljUQ/btqBFwoSDV7/ULIzMkrIID8ANQixLJXhVK/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FcmljUQ%2FbtqBFwoSDV7%2FULIzMkrIID8ANQixLJXhVK%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;Situations With Adversarial Generation(SWAG) dataset은 113,000rodml sentence-pair를 가지고 있습니다. 이 dataset은 grounded common-sense inference를 평가합니다. 주어진 문자으로, 과제는 4개 선택 중 가장 그럴듯한 continuation을 선택합니다. SWAG dataset에서 fine-tuning할 때, 우리는 4개의 input sequences(sentence A)와 가능한 continuation(sentence B)를 구성했습니다. 도입된 유일한 작업별 매개변수는 [CLS] token representation C가 있는 dot product가 softmax layer로 정규화된 각 선택 항목에 대한 점수를 나타내는 벡터입니다.&lt;/p&gt;
&lt;p&gt;우리는 3epoch, learning rate 2e-5, batch size 16으로 fine-tune하였스빈다. 결과는 Tabe 4에서 확인할 수 있습니다. BERTLARGE는 ESIM+ELMo 보다 +27.1%, 그리고 OpenAI GPT 보다 +8.3 성능을 보였습니다.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;5 Ablation Studies 는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;a href=&quot;https://ynebula.tistory.com/59&quot;&gt;https://ynebula.tistory.com/59&lt;/a&gt;&lt;/b&gt;&lt;/p&gt;</description>
      <category>Artificial Intelligence</category>
      <category>BERT</category>
      <category>nlp</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/58</guid>
      <comments>https://ynebula.tistory.com/58#entry58comment</comments>
      <pubDate>Thu, 30 Jan 2020 22:00:26 +0900</pubDate>
    </item>
    <item>
      <title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3.1-3.2</title>
      <link>https://ynebula.tistory.com/56</link>
      <description>&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;b&gt;3 BERT는 다음 컨텐츠를 이용바랍니다.&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/55&quot;&gt;https://ynebula.tistory.com/55&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1580470943500&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-og-type=&quot;article&quot; data-og-title=&quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3&quot; data-og-description=&quot;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3. BERT 우리는 BERT와 자세한 구현법을 이 Section에서 소개합니다. 두 개의 절차가 있습니다(pre-training과 fine-tuning). Pre-training동안, 여러 pre-train..&quot; data-og-host=&quot;ynebula.tistory.com&quot; data-og-source-url=&quot;https://ynebula.tistory.com/55&quot; data-og-url=&quot;https://ynebula.tistory.com/55&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/d06cLZ/hyELSuHROO/bXseCpYWeZssFbDPhVQTQ0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/YiDa7/hyELPY3hfb/83oJou7B5QXwFkCmGrl0q0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/bz5OTY/hyELRigXfU/PxWrvr2ZrZJY2yB9hQNpT1/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/55&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://ynebula.tistory.com/55&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/d06cLZ/hyELSuHROO/bXseCpYWeZssFbDPhVQTQ0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/YiDa7/hyELPY3hfb/83oJou7B5QXwFkCmGrl0q0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/bz5OTY/hyELRigXfU/PxWrvr2ZrZJY2yB9hQNpT1/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3&lt;/p&gt;
&lt;p class=&quot;og-desc&quot;&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3. BERT 우리는 BERT와 자세한 구현법을 이 Section에서 소개합니다. 두 개의 절차가 있습니다(pre-training과 fine-tuning). Pre-training동안, 여러 pre-train..&lt;/p&gt;
&lt;p class=&quot;og-host&quot;&gt;ynebula.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;&lt;b&gt;&lt;span&gt;3.1 Pre-training BERT&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Peter et al(2018a),&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;Radford et al(2018)&lt;/span&gt;과 다르게&lt;span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;우리는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;BERT&lt;/span&gt;를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;pre-train&lt;/span&gt;하기 위해&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;traditional left-to-right or right-to-left language models&lt;/span&gt;를 에 사용하지 않았다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;대신 우리는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;two unsupervised tasks&lt;/span&gt;을 사용해서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;BERT&lt;/span&gt;에&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;pre-train&lt;/span&gt;했습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;span&gt;Task#1: Masked LM&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Deep bidirectional model&lt;/span&gt;이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;left-to-right&lt;/span&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;right-to-left&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;보다 더&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;powerful&lt;/span&gt;하다고 생각합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;운이 좋지 않게&lt;span&gt;, standard conditional language models&lt;/span&gt;는 오직&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;left-to-right&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;또는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;right-to-left&lt;/span&gt;로만 훈련되었습니다&lt;span&gt;. Bidirectional conditioning&lt;/span&gt;은 각 단어에 자기 자신을 간접적으로만 볼 수 있습니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;그리고&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;model&lt;/span&gt;은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;multi-layered context&lt;/span&gt;에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;target word&lt;/span&gt;를 예측합니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Deep bidirectional representation&lt;/span&gt;을 학습하기 위해&lt;span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;우리는 몇 개의&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;token(some percentage)&lt;/span&gt;을 랜덤으로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;mask&lt;/span&gt;합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;그리고 그&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;masked tokens&lt;/span&gt;를 예측합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;우리는 이 절차를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&amp;ldquo;masked LM&amp;rdquo;(MLM)&lt;/span&gt;이라고 말합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;이 경우&lt;span&gt;, mask tokens&lt;/span&gt;에 해당하는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;final hidden vectors&lt;/span&gt;는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;output softmax&lt;/span&gt;에 입력됩니다(&lt;span style=&quot;color: #333333;&quot;&gt;Output softmax&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;는 Vocab에 대한 연산)&lt;/span&gt;&lt;span&gt;.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;우리는 각&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;sequence&lt;/span&gt;의 모든&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;wordpiece token&lt;/span&gt;에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;15%&lt;/span&gt;를 무작위로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;mask&lt;/span&gt;합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;우리는 전체&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;input&lt;/span&gt;을 재구성하는 것 보다 오직&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;masked words&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;예측합니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;이 방법으로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;bidirectional pre-trained model&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;얻을 수 있지만&lt;span&gt;, downside에서는 &lt;/span&gt;&lt;span&gt;[MASK] token&lt;/span&gt;이&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;fine-tuning&lt;/span&gt;동안 나타나지 않기 때문에&lt;span&gt;, pre-training&lt;/span&gt;과&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;fine-tuning &lt;/span&gt;사이에는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;mismatch&lt;/span&gt;가 발생합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;이를 줄이기 위해&lt;span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;우리는 실제&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;[MASK] token&lt;/span&gt;을 항상&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;&amp;ldquo;masked&amp;rdquo;&lt;/span&gt;로 대체하지 않습니다&lt;span&gt;. Training data generator&lt;/span&gt;는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;token&lt;/span&gt;위치의&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;15%&lt;/span&gt;를 무작위로 선택합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;만약&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;i-th token&lt;/span&gt;이 선택되었다면&lt;span&gt;, 80%&lt;/span&gt;는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;[MASK] token&lt;/span&gt;으로 대체하고&lt;span&gt;, 10%&lt;/span&gt;는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;random token&lt;/span&gt;으로 대체하고&lt;span&gt;, 10%&lt;/span&gt;로는 변경하지 않습니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;그런 다음&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;i&gt;&lt;span&gt;T&lt;/span&gt;&lt;/i&gt;&lt;i&gt;&lt;span&gt;&lt;sub&gt;i&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;cross entropy loss&lt;/span&gt;를 사용하여 원래&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;token&lt;/span&gt;을 예측하는데 사용될 것입니다. 이 절차의 변화를 부록 C.2에서 비교합니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;span&gt;Task #2: Next Sentence Prediction (NSP)&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Question Answering&lt;/span&gt;과 같이 중요한&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;downstream tasks&lt;/span&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;Natural Language Inference(NLI)&lt;/span&gt;는 두&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;sentences&lt;/span&gt;의&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;relationship&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;이해를 기반으로 합니다&lt;span&gt;. Sentence relationships&lt;/span&gt;을 이해한&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;Model&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;훈련을 위해&lt;span&gt;,&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;span&gt;binarized next sentence prediction task&lt;/span&gt;를&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;pre-train&lt;/span&gt;합니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;예로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;A&lt;/span&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;B&lt;/span&gt;문장이 선택되었을 때&lt;span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;실제&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;A&lt;/span&gt;다음 문장으로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;50%&lt;/span&gt;는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;B&lt;/span&gt;가&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;50%&lt;/span&gt;는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;corpus&lt;/span&gt;중 무작위로 선택됩니다&lt;span&gt;(labeled as IsNext). Figure 1&lt;/span&gt;에서 보았듯이&lt;span&gt;, C&lt;/span&gt;는 다음&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;sentence&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;예측&lt;span&gt;(NSP)&lt;/span&gt;에 사용됩니다&lt;span&gt;. Section 5.1&lt;/span&gt;에서 설명하겠습니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;이&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;pre-training&lt;/span&gt;은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;QA&lt;/span&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;NLI&lt;/span&gt;에 매우 유용합니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;NSP task&lt;/span&gt;는&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;Jernite et al.(2017)&lt;/span&gt;과&lt;span&gt;, Logeswaran and Lee (2018)&lt;/span&gt;에서 사용된&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;representation-learning objectives&lt;/span&gt;와 매우 관련이 있습니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;하지만 사전 작업에서&lt;span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;오직&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;sentence embedding&lt;/span&gt;만&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;down-stream tasks&lt;/span&gt;로 이동합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;반면&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;BERT&lt;/span&gt;는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;end-task model parameters&lt;/span&gt;를 초기화하기 위해 모든&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;parameter&lt;/span&gt;를 이동합니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;span&gt;Pre-training data&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Pre-training procedure&lt;/span&gt;은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;language model pre-training&lt;/span&gt;에 관한 기존 문헌을 주로 따른다&lt;span&gt;. Pre-training corpus&lt;/span&gt;로&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;BooksCorpus(800M words)(Zhu et al., 2015)&lt;/span&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;English Wikipedia (2,500M words)&lt;/span&gt;를 사용하였습니다&lt;span&gt;. Wikipedia&lt;/span&gt;에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;text passages&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;그리고&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;lists, tables&lt;/span&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;headers&lt;/span&gt;는 제외했습니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;이건&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;long contiguous sequences&lt;/span&gt;를 추출하기 위해&lt;span&gt;, shuffled sentence-level corpus(as the Billion Word Benchmark)&lt;/span&gt;보다&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;document-level corpus&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;사용이 중요합니다&lt;span&gt;.&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;b&gt;&lt;span&gt;3.2 Fine-tuning BERT&lt;/span&gt;&lt;/b&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Transformer의&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;self-attention mechanism&lt;/span&gt;은&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;BERT&lt;/span&gt;가 입력과 출력을 적절히 바꿔서 많은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;downstream tasks&lt;/span&gt;를 모델링 하도록 허락하기 때문에&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;Fine-tuning&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;은 간단합니다&lt;span style=&quot;color: #333333;&quot;&gt;(입력과 출력이 single text 또는 text pairs에 포함될지 어떤지)&lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Text pairs&lt;/span&gt;&lt;span&gt;를 포함한&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;applications&lt;/span&gt;&lt;span&gt;를 위해&lt;/span&gt;&lt;span&gt;,&amp;nbsp;&lt;/span&gt;&lt;span&gt;공통 패턴은&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;bidirectional cross attention&lt;/span&gt;&lt;span&gt;을 적용하기 전에&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;text pairs&lt;/span&gt;&lt;span&gt;를 독립적으로 인코딩합니다&lt;/span&gt;&lt;span&gt;(Parikh et al. (2016), Seo et al. (2017)).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;BERT&lt;/span&gt;는 이 두 단계를 통합하기 위해&lt;span&gt;, self-attention mechanism&lt;/span&gt;을 사용합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;이 두 단계는 연결된 텍스트를 인코딩하기 때문에&lt;span&gt;, self-attention&lt;/span&gt;로 연결된&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;text pair&lt;/span&gt;를 인코딩한 것은 두 문장 사이의&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;bidirectional cross-attention&lt;/span&gt;을 포함하기 때문입니다.&lt;/p&gt;
&lt;p&gt;각&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;task&lt;/span&gt;마다&lt;span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;우리는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;task-specific inputs&lt;/span&gt;와&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;outputs&lt;/span&gt;를&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;BERT&lt;/span&gt;에 간단하게&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;plus&lt;/span&gt;합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;그리고&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;end-to-end&lt;/span&gt;로 모든&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;parameters&lt;/span&gt;를&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;fine-tune&lt;/span&gt;합니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;사전 훈련에서 입력, 세트 A와 문장 B는 다음과 유사합니다.&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span&gt;sentence pairs in paraphrasing,&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;hypothesis-premise pairs in entailment,&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;question-passage pairs question answering,&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;&lt;span&gt;a degenerate text-0 pair in text classification or sequence tagging&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;출력에서&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;token representation&lt;/span&gt;은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;sequence tagging 또는 question answering와 같은&lt;/span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;token-level taks를 위해 output layer로 공급됩니다(token-level task - ). 그리고&lt;span&gt;&amp;nbsp;[CLS] representation&lt;/span&gt;은&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;entailment&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;또는&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;sentiment analysis&lt;/span&gt;와 같은 분류를 위해 출력 계층으로 공급됩니다.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Pre-training&lt;/span&gt;과 비교하여&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;fine-tuning&lt;/span&gt;은 비교적&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;span&gt;inexpensive&lt;/span&gt;합니다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;논문의 모든 결과는 단일 클라우드&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;TPU&lt;/span&gt;에서 최대&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;1&lt;/span&gt;시간 이내에&lt;span&gt;,&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;또는 정확히 동일한 사전 교육 모델에서 시작하여&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;GPU&lt;/span&gt;에서 몇 시간 내에 복제할 수 있다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;본 섹션은 해당 섹션의 하위 섹션에서 태스크별 세부사항을 설명한다&lt;span&gt;.&lt;span&gt;&amp;nbsp;&lt;/span&gt;&lt;/span&gt;자세한 내용은 부록&lt;span&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;4.5&lt;/span&gt;를 참조하십시오&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;4장은 다음 컨텍츠를 참고바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;span&gt;&lt;a href=&quot;https://ynebula.tistory.com/58&quot;&gt;https://ynebula.tistory.com/58&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1580470938847&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-og-type=&quot;article&quot; data-og-title=&quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 4&quot; data-og-description=&quot;4 Experiments 이번 Section에서는 11 NLP 과제에서 BERT fine-tuning한 결과를 알아 보겠습니다. 4.1 GLUE General Lanuage Understanding Evaluation(GLUE) benchmar(Wang et al., 2018a)은 다양한 자연어 이해..&quot; data-og-host=&quot;ynebula.tistory.com&quot; data-og-source-url=&quot;https://ynebula.tistory.com/58&quot; data-og-url=&quot;https://ynebula.tistory.com/58&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/bVO8PU/hyEL0zwiBC/2pMF8jk6eKHcIc1dBQW3lK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/0ksqK/hyEL3bXByQ/HZsm8rOT30uiVuuB1vtWJ1/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/58&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://ynebula.tistory.com/58&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/bVO8PU/hyEL0zwiBC/2pMF8jk6eKHcIc1dBQW3lK/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/0ksqK/hyEL3bXByQ/HZsm8rOT30uiVuuB1vtWJ1/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 4&lt;/p&gt;
&lt;p class=&quot;og-desc&quot;&gt;4 Experiments 이번 Section에서는 11 NLP 과제에서 BERT fine-tuning한 결과를 알아 보겠습니다. 4.1 GLUE General Lanuage Understanding Evaluation(GLUE) benchmar(Wang et al., 2018a)은 다양한 자연어 이해..&lt;/p&gt;
&lt;p class=&quot;og-host&quot;&gt;ynebula.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>BERT</category>
      <category>nlp</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/56</guid>
      <comments>https://ynebula.tistory.com/56#entry56comment</comments>
      <pubDate>Wed, 29 Jan 2020 21:27:22 +0900</pubDate>
    </item>
    <item>
      <title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3</title>
      <link>https://ynebula.tistory.com/55</link>
      <description>&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;BERT논문을 직역 및 의역으로 작성한 내용입니다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;2 Related Work은 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;a href=&quot;https://ynebula.tistory.com/54&quot;&gt;https://ynebula.tistory.com/54&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1580470857508&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-og-type=&quot;article&quot; data-og-title=&quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 2&quot; data-og-description=&quot;BERT논문을 직역 및 의역으로 작성한 내용입니다. 이전 BERT는 다음 컨텐츠를 이용바랍니다. https://ynebula.tistory.com/53 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding..&quot; data-og-host=&quot;ynebula.tistory.com&quot; data-og-source-url=&quot;https://ynebula.tistory.com/54&quot; data-og-url=&quot;https://ynebula.tistory.com/54&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/caGAKM/hyELOMBL7I/Po4faKnjVerGK4sdu7CZA0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/CZeIl/hyELNUtFMX/se3gp4jyyNzwkOJSlIzwT1/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/54&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://ynebula.tistory.com/54&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/caGAKM/hyELOMBL7I/Po4faKnjVerGK4sdu7CZA0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/CZeIl/hyELNUtFMX/se3gp4jyyNzwkOJSlIzwT1/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 2&lt;/p&gt;
&lt;p class=&quot;og-desc&quot;&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다. 이전 BERT는 다음 컨텐츠를 이용바랍니다. https://ynebula.tistory.com/53 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding..&lt;/p&gt;
&lt;p class=&quot;og-host&quot;&gt;ynebula.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;3.&lt;span&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;b&gt;&lt;span&gt;BERT&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; width=&quot;780&quot; height=&quot;318&quot; data-ke-mobilestyle=&quot;widthContent&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/DyeLm/btqBt9PkSb0/UYejojStXnMQbbSv3ezll0/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/DyeLm/btqBt9PkSb0/UYejojStXnMQbbSv3ezll0/img.png&quot; data-alt=&quot;Figure 1&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/DyeLm/btqBt9PkSb0/UYejojStXnMQbbSv3ezll0/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FDyeLm%2FbtqBt9PkSb0%2FUYejojStXnMQbbSv3ezll0%2Fimg.png&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; width=&quot;780&quot; height=&quot;318&quot; data-ke-mobilestyle=&quot;widthContent&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Figure 1&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;&lt;span&gt;우리는 BERT와 자세한 구현법을 이 Section에서 소개합니다. 두 개의 절차가 있습니다(pre-training과 fine-tuning). Pre-training동안, 여러 pre-training 과제에서 unlabeled data로 학습합니다. Fine-tuning동안, BERT 모델은 먼저 사전 훈련 된 매개 변수로 초기화되며, 모든 매개 변수는 다운 스트림 작업에서 레이블이 지정된 데이터를 사용하여 미세 조정됩니다. 같은 pre-trained parameter로 초기화 될 지라도, 각 downstream 과제는 fine-tune된 model로 분리됩니다. Figure 1에서 예 question-anwsering은 이번 Section 예제로 사용될 예정입니다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;BERT의 특징은 다른 과제에서도&amp;nbsp; 통합 구조입니다. &lt;span style=&quot;color: #333333;&quot;&gt;pre-trained구조와 final downstream구조 사이에는 약간의 차이만 있습니다.&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;span&gt;Model Architecture&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;BERT 모델 구조는 multi-layer bidirectional Transformer encoder를 기반으로&amp;nbsp; 합니다. &lt;/span&gt;&lt;span&gt;Transformer는 Vaswani et al(2017)에 설명되었으며,&amp;nbsp; 그리고 tensor2tensor 라이브러리에 릴리즈 되었습니다. Transformer의 사용법은 컴몬(common)해졋고 BERT 구현법은 원본과 거의 같다. 우리는 모델의 구조와 배경 설명은 생략할 것입니다. Vaswani et al.와 &lt;span style=&quot;color: #333333;&quot;&gt;&quot;The Annotated Transformer&quot;와 같은 훌륭한 가이드를&lt;/span&gt;&amp;nbsp;참고바랍니다.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;이 논문에서, L은 layer의 수(i.e., Transformer blocks), H는 hidden size, A는 self-attention head의 수를 의미합니다. 우리는 두 개의 모델을 제공합니다.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;BERT&lt;sub&gt;BASE&lt;/sub&gt; (&lt;span style=&quot;color: #333333;&quot;&gt;L=12, H=768, A=12, Total Parameter=110M&lt;/span&gt;)&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;BERT&lt;sub&gt;LARGE&lt;/sub&gt; (&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;L=24, H=1024, A=16, Total Parameter=340M&lt;/span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;BERT&lt;/span&gt;BASE는 비교를 위해 OpenAI GPT 같은 model size입니다. 하지만 BERT Transformer는 bidirectional self-attention을 사용했습니다. 반면 GPT Transformer는 constrained self-attention을 사용했습니다(모든 token은 오직 자신의 왼쪽 context만 참조함).&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;&lt;span&gt;Input/Output Representations&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;figure class=&quot;imageblock alignCenter&quot; width=&quot;785&quot; height=&quot;255&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; data-ke-mobilestyle=&quot;widthContent&quot;&gt;&lt;span data-url=&quot;https://blog.kakaocdn.net/dn/b3ZnBg/btqBzkuRbW2/Lh501cekrCHsyPoWMXSAgk/img.png&quot; data-phocus=&quot;https://blog.kakaocdn.net/dn/b3ZnBg/btqBzkuRbW2/Lh501cekrCHsyPoWMXSAgk/img.png&quot; data-alt=&quot;Figure 2&quot;&gt;&lt;img src=&quot;https://blog.kakaocdn.net/dn/b3ZnBg/btqBzkuRbW2/Lh501cekrCHsyPoWMXSAgk/img.png&quot; srcset=&quot;https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&amp;fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2Fb3ZnBg%2FbtqBzkuRbW2%2FLh501cekrCHsyPoWMXSAgk%2Fimg.png&quot; width=&quot;785&quot; height=&quot;255&quot; data-origin-width=&quot;0&quot; data-origin-height=&quot;0&quot; data-ke-mobilestyle=&quot;widthContent&quot; onerror=&quot;this.onerror=null; this.src='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png'; this.srcset='//t1.daumcdn.net/tistory_admin/static/images/no-image-v1.png';&quot;/&gt;&lt;/span&gt;&lt;figcaption&gt;Figure 2&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;&lt;span&gt;BERT&lt;/span&gt;는 다양한 &lt;span&gt;down-stream tasks&lt;/span&gt;를 처리하기 위해 &lt;span&gt;input representation&lt;/span&gt;은 하나의 &lt;span&gt;token sequence&lt;/span&gt;에서 &lt;span&gt;a single sentence and pair of sentences(e.g. &amp;lt;Question, Answer&amp;gt;)&lt;/span&gt;로 분명하게 표현한다&lt;span&gt;. &lt;/span&gt;이 작업을 통해&lt;span&gt;, &amp;ldquo;sentence&amp;rdquo;&lt;/span&gt;는 실제 문장&lt;span&gt;(actual linguistic sentence)&lt;/span&gt;이 아니라 &lt;span&gt;contiguous text&lt;/span&gt;의 임의의 범위가&lt;span&gt;(arbitrary span) &lt;/span&gt;된다&lt;span&gt;. &amp;ldquo;sequence&amp;rdquo;&lt;/span&gt;는 &lt;span&gt;BERT&lt;/span&gt;에 대한&lt;span&gt; input token sequence&lt;/span&gt;를 말하며&lt;span&gt;, single sentence &lt;/span&gt;또는&lt;span&gt; two sentence&lt;/span&gt;로 되어 있을 수 있습니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;우리는 &lt;span&gt;30,000&lt;/span&gt;개의 &lt;span&gt;wordpiece embedding&lt;/span&gt;을 사용했습니다&lt;span&gt;. &lt;/span&gt;모든 &lt;span&gt;sequence&lt;/span&gt;의 시작은&lt;span&gt; special classification token([CLS]) &lt;/span&gt;입니다&lt;span&gt;. Final hidden state&lt;/span&gt;에서 이 &lt;span&gt;token&lt;/span&gt;은&lt;span&gt; classification tasks&lt;/span&gt;에서 집계&lt;span&gt;(aggregate) sequence representation&lt;/span&gt;으로 사용됩니다&lt;span&gt;. &lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span&gt;Sentence pairs&lt;/span&gt;은&lt;span&gt; single sequence&lt;/span&gt;에 포함되어 있습니다&lt;span&gt;. &lt;/span&gt;우리는 두 가지 방법으로 &lt;span&gt;sentence&lt;/span&gt;를 구분합니다&lt;span&gt;. &lt;/span&gt;첫 째&lt;span&gt;, special token([SEP])&lt;/span&gt;로 구분합니다&lt;span&gt;. &lt;/span&gt;두 번째 방법은 A문장인지 B문장인지를 나타내는 &lt;span&gt;learned embedding&lt;/span&gt;을 모든 &lt;span&gt;token&lt;/span&gt;에 추가합니다&lt;span&gt;. Figure 1&lt;/span&gt;에서 보듯이&lt;span&gt;, input embedding&lt;/span&gt;을 &lt;span&gt;&lt;i&gt;E&lt;/i&gt; 나타냈습니다(&lt;/span&gt;&lt;span&gt;special token [CLS]의 final hidden vector&lt;span style=&quot;color: #333333;&quot;&gt;&lt;span&gt;&amp;nbsp;&lt;/span&gt;(as &lt;i&gt;C&lt;/i&gt;&amp;isin;&lt;b&gt;R&lt;/b&gt;&lt;span&gt;&lt;sup&gt;&lt;i&gt;H&lt;/i&gt;&lt;/sup&gt;&lt;/span&gt;), &lt;/span&gt;&lt;/span&gt;&lt;span style=&quot;letter-spacing: 0px;&quot;&gt;&lt;i&gt;i&lt;/i&gt;&lt;span&gt;&lt;sup&gt;th&lt;/sup&gt;&lt;/span&gt; input token의 final hidden vector (as &lt;i&gt;T&lt;span&gt;&lt;sub&gt;i&lt;/sub&gt;&lt;/span&gt;&lt;/i&gt;&amp;isin;&lt;b&gt;R&lt;/b&gt;&lt;span&gt;&lt;sup&gt;&lt;i&gt;H&lt;/i&gt;&lt;/sup&gt;&lt;/span&gt;))&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;주어진 &lt;span&gt;token&lt;/span&gt;으로&lt;span&gt;, input representation&lt;/span&gt;은 대응하는 &lt;span&gt;token, segment, position embeddings&lt;/span&gt;의 합으로 생성됩니다&lt;span&gt;. &lt;/span&gt;이 생성 방법은 &lt;span&gt;Figure 2&lt;/span&gt;에서 볼 수 있습니다&lt;span&gt;.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;3장 다음 내용은 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;span&gt;&lt;a href=&quot;https://ynebula.tistory.com/56&quot;&gt;https://ynebula.tistory.com/56&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1580470856299&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-og-type=&quot;article&quot; data-og-title=&quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3.1-3.2&quot; data-og-description=&quot;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3.1 Pre-training BERT Peter et al(2018a), Radford et al(2018)과 다르게, 우리는 BERT를 pre-train하기 위해 traditional left-to-right or right-to-left lan..&quot; data-og-host=&quot;ynebula.tistory.com&quot; data-og-source-url=&quot;https://ynebula.tistory.com/56&quot; data-og-url=&quot;https://ynebula.tistory.com/56&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/qqtCl/hyELTf5cZf/0kYtbuecJZgKWOf5pF4JN0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/h6v8X/hyELWcOruj/J0NTVJgkjGNBUi1kEKxhhk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/56&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://ynebula.tistory.com/56&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/qqtCl/hyELTf5cZf/0kYtbuecJZgKWOf5pF4JN0/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/h6v8X/hyELWcOruj/J0NTVJgkjGNBUi1kEKxhhk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3.1-3.2&lt;/p&gt;
&lt;p class=&quot;og-desc&quot;&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3.1 Pre-training BERT Peter et al(2018a), Radford et al(2018)과 다르게, 우리는 BERT를 pre-train하기 위해 traditional left-to-right or right-to-left lan..&lt;/p&gt;
&lt;p class=&quot;og-host&quot;&gt;ynebula.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Artificial Intelligence</category>
      <category>BERT</category>
      <category>nlp</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/55</guid>
      <comments>https://ynebula.tistory.com/55#entry55comment</comments>
      <pubDate>Mon, 27 Jan 2020 20:59:51 +0900</pubDate>
    </item>
    <item>
      <title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 2</title>
      <link>https://ynebula.tistory.com/54</link>
      <description>&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&amp;nbsp;BERT논문을 직역 및 의역으로 작성한 내용입니다.&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;&lt;span&gt;&lt;span&gt;&lt;span style=&quot;color: #333333;&quot;&gt;BERT Abstract 는 다음 컨텐츠를 이용바랍니다.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/53&quot;&gt;https://ynebula.tistory.com/53&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1580470741544&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-og-type=&quot;article&quot; data-og-title=&quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 1&quot; data-og-description=&quot;BERT논문을 직역 및 의역으로 작성한 내용입니다. Abstract 새로운 language representation model BERT를 소개합니다. BERT는 Transformer의 Bidirectional Encoder Representations을 사용합니다. 최근 language..&quot; data-og-host=&quot;ynebula.tistory.com&quot; data-og-source-url=&quot;https://ynebula.tistory.com/53&quot; data-og-url=&quot;https://ynebula.tistory.com/53&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/KOcGh/hyELRWROgt/Zb8yxQLbfQjsbHw4taZxXk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/bxLizg/hyEL1kT0rd/9sQqktREhCUtlSm0p6au3K/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/53&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://ynebula.tistory.com/53&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/KOcGh/hyELRWROgt/Zb8yxQLbfQjsbHw4taZxXk/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800,https://scrap.kakaocdn.net/dn/bxLizg/hyEL1kT0rd/9sQqktREhCUtlSm0p6au3K/img.png?width=800&amp;amp;height=800&amp;amp;face=0_0_800_800');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 1&lt;/p&gt;
&lt;p class=&quot;og-desc&quot;&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다. Abstract 새로운 language representation model BERT를 소개합니다. BERT는 Transformer의 Bidirectional Encoder Representations을 사용합니다. 최근 language..&lt;/p&gt;
&lt;p class=&quot;og-host&quot;&gt;ynebula.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;&lt;b&gt;2 Relate Work&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;일반 언어 표현 사전훈련은 오랜 역사가 있습니다. 우리는 가장 널리 사용되는 방법을 간단하게 리뷰하겠습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;2.1 Unsupervised Feature-based Approaches&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;단어를 적절하게 표현하는 연구는 수십 년 간 연구한 분야 입니다. 신경망을 이용한 방법(Mikolov et al.,, 2013; Pennington et al., 2014)과 그렇지 않은 방법(Brown et al., 1992; Ando and Zhang, 2005; Blitzer et al., 2006)이 있습니다. 단어 임베딩 사전학습은 현대 NLP에서 핵심 부분입니다. 처음부터 학습하는 방법 보다 많은 성능 향상을 제공합니다. 단어 임베딩 벡터를 사전학습 시키기 위해, left-to-right language modeling objectives를 사용했습니다. 또한 context의 좌우를 교정하였습니다(Mikolov et al., 2013). 이러한 방법은 coarer granularities 일반화 되었습니다(sentence embeddings(Kiros et al., 2015; Logeswaran and Lee, 2018) 또는 paragraph embeddings(Le and Mikolov, 2014)). 문장 표현을 학습하기 위해, 1) 다음 문장들을 후보를 랭크시키는 사전학습과(Jernite et al., 2017; Logeswaran and Lee, 2018) 2)이전 문장을 이용해서 다음 문장을 left-to-right 생성 또는 3) auto-encoder derived objectives를 제거하는(Hill et al., 2016) 방법이 사용되었습니다.&amp;nbsp;&lt;/p&gt;
&lt;p&gt;ELMo와 그 이전의 것은(Peters et al., 2017, 2018a) 다른 차원에 따라 전통적인 워드 임베딩 연구를 일반화 했습니다. 그것들은 left-to-right과 right-to-left 언어 모델로 context-sensitive feature를 추출했습니다. 각 token의 contextual representation은 left-to-right과 right-to-left representations를 연결합니다. 존재하는 task-specific에 contextual word embeddings을 합쳐서, ELMo는 여러 중요한 NLP bechmarks SOTA를 이루어 냈습니다(question-answering(Rajpurkar et al., 2016), sentiment analysis(Socher et al., 2013), named entity recognition(Tjong Kim Sang and De Meulder, 2003)). Melamud et al.(2016)에 LSTMs를 이용한 좌우 context로부터 한 단어를 예측하는 과제를 이용하는 contextual representations 학습을 제안했습니다. ELMo는 이와 유사하게 모델을 feature-based 했습니다(deeply bidirectional은 사용 암함). Feduset al. (2018) cloze task는 text 생성 모델의 향상으로 사용할 수 있다는 것을 보여줬습니다.&lt;/p&gt;
&lt;p&gt;&lt;b&gt;2.2 Unsupervised Fine-tuning Approaches&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;Feature-based 방법과 마찬가지로, 처음에는 unlabeled text로 word embedding parameters만 사전학습 하였습니다(Collobert and Weston, 2008).&lt;/p&gt;
&lt;p&gt;최근에는, contextual token representations를 생성하는 &lt;span style=&quot;color: #333333;&quot;&gt;sentence or document는&amp;nbsp; unlabeled text로 사전학습 됩니다. 그리고 &lt;span style=&quot;color: #333333;&quot;&gt;supervised downstream task동안 &lt;/span&gt;fine-tune 학습됩니다(Dai and Le,&amp;nbsp; 2015; Howard and Ruder, 2018; Radford et al., 2018). 이러한 방법의 장점은 처음부터 배울 필요가 거의 없는 파라미터입니다. OpenAI GPT는 이전에 GLUE benchmark로부터 많은 setence-level tasks에서 SOTA가 되었습니다(Wanget al., 2018a). Left-to-right 언어 모델링과 auto-encoder objectives는 다음과 같은 pre-training에 사용되었습니다(Howard and Ruder, 2018; Radford et al., 2018;; Dai and Le, 2015).&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;2.3 Transfer Learning from Supervised Data&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;자연어 추론(Conneau et al., 2017)과 기계번역(McCann et al., 2017)에서 large datasets으로 supervised tasks한 효과적인 transfer를 보여준적은 없었습니다. Computer vision 연구는 large pre-trained models 전이학습의 중요성을 설명했습니다(ImageNet(Deng et al., 2009; Yosinski iet al., 2014)).&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;b&gt;3 BERT는 다음 컨텐츠를 이용바랍니다.&lt;/b&gt;&lt;/p&gt;
&lt;p style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/55&quot;&gt;https://ynebula.tistory.com/55&lt;/a&gt;&lt;/p&gt;
&lt;figure id=&quot;og_1580470788828&quot; contenteditable=&quot;false&quot; data-ke-type=&quot;opengraph&quot; data-og-type=&quot;article&quot; data-og-title=&quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3&quot; data-og-description=&quot;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3. BERT 우리는 BERT와 자세한 구현법을 이 Section에서 소개합니다. 두 개의 절차가 있습니다(pre-training과 fine-tuning). Pre-training동안, 여러 pre-train..&quot; data-og-host=&quot;ynebula.tistory.com&quot; data-og-source-url=&quot;https://ynebula.tistory.com/55&quot; data-og-url=&quot;https://ynebula.tistory.com/55&quot; data-og-image=&quot;https://scrap.kakaocdn.net/dn/d06cLZ/hyELSuHROO/bXseCpYWeZssFbDPhVQTQ0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/YiDa7/hyELPY3hfb/83oJou7B5QXwFkCmGrl0q0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/bz5OTY/hyELRigXfU/PxWrvr2ZrZJY2yB9hQNpT1/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246&quot;&gt;&lt;a href=&quot;https://ynebula.tistory.com/55&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot; data-source-url=&quot;https://ynebula.tistory.com/55&quot;&gt;
&lt;div class=&quot;og-image&quot; style=&quot;background-image: url('https://scrap.kakaocdn.net/dn/d06cLZ/hyELSuHROO/bXseCpYWeZssFbDPhVQTQ0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/YiDa7/hyELPY3hfb/83oJou7B5QXwFkCmGrl0q0/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246,https://scrap.kakaocdn.net/dn/bz5OTY/hyELRigXfU/PxWrvr2ZrZJY2yB9hQNpT1/img.png?width=602&amp;amp;height=246&amp;amp;face=0_0_602_246');&quot;&gt;&amp;nbsp;&lt;/div&gt;
&lt;div class=&quot;og-text&quot;&gt;
&lt;p class=&quot;og-title&quot;&gt;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 논문 한글 번역 - 3&lt;/p&gt;
&lt;p class=&quot;og-desc&quot;&gt;BERT논문을 직역 및 의역으로 작성한 내용입니다. 3. BERT 우리는 BERT와 자세한 구현법을 이 Section에서 소개합니다. 두 개의 절차가 있습니다(pre-training과 fine-tuning). Pre-training동안, 여러 pre-train..&lt;/p&gt;
&lt;p class=&quot;og-host&quot;&gt;ynebula.tistory.com&lt;/p&gt;
&lt;/div&gt;
&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
      <category>Artificial Intelligence</category>
      <category>BERT</category>
      <category>nlp</category>
      <author>[성운]</author>
      <guid isPermaLink="true">https://ynebula.tistory.com/54</guid>
      <comments>https://ynebula.tistory.com/54#entry54comment</comments>
      <pubDate>Mon, 27 Jan 2020 20:57:59 +0900</pubDate>
    </item>
  </channel>
</rss>