Capítulo26 Nubes de palabra en R
## [1] "2024-11-07"
26.1 Ejemplo #1
Para generar nubes de palabras con R. Se necesita los paquetes wordcloud, RColorBrewer y wordcloud2
Ese ejemplo es una copia de la siguiente pagina de web
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tm, SnowballC, wordcloud, RColorBrewer, wordcloud2)
library(wordcloud) # Un paquete para hacer word cloud
library(wordcloud2) # paquete más sencillo para hacer word cloud2
library(RColorBrewer) # paquete para cambiar los colores
library(tm) # paquete de text mining
library(SnowballC) # paquete para trabajar en otro idioma aparte de ingles
26.2 Usando los datos en el paquete wordcloud2 que se llama demoFreq
## word freq
## oil oil 85
## said said 73
## prices prices 48
## opec opec 42
## mln mln 31
## the the 26
## last last 24
## bpd bpd 23
## dlrs dlrs 23
## crude crude 21
## market market 20
## reuter reuter 20
## saudi saudi 18
## will will 18
## one one 17
## barrel barrel 15
## kuwait kuwait 14
## new new 14
## official official 14
## pct pct 14
## price price 13
## barrels barrels 11
## government government 11
## production production 11
## sheikh sheikh 11
## industry industry 10
## meeting meeting 10
## minister minister 10
## world world 10
## also also 9
## billion billion 9
## futures futures 9
## month month 9
## output output 9
## petroleum petroleum 9
## quota quota 9
## sources sources 9
## accord accord 8
## analysts analysts 8
## but but 8
## group group 8
## gulf gulf 8
## january january 8
## markets markets 8
## report report 8
## today today 8
## december december 7
## demand demand 7
## economic economic 7
## economy economy 7
## energy energy 7
## help help 7
## international international 7
## may may 7
## nazer nazer 7
## nymex nymex 7
## posted posted 7
## present present 7
## they they 7
## traders traders 7
## 158 158 6
## ability ability 6
## agreement agreement 6
## ali ali 6
## april april 6
## arabia arabia 6
## budget budget 6
## emergency emergency 6
## exchange exchange 6
## hold hold 6
## imports imports 6
## members members 6
## recent recent 6
## riyals riyals 6
## says says 6
## sell sell 6
## years years 6
## abdulaziz abdulaziz 5
## agency agency 5
## arab arab 5
## ceiling ceiling 5
## company company 5
## contract contract 5
## daily daily 5
## emirates emirates 5
## expected expected 5
## expenditure expenditure 5
## exports exports 5
## growth growth 5
## now now 5
## opecs opecs 5
## plans plans 5
## qatar qatar 5
## quoted quoted 5
## research research 5
## reserve reserve 5
## reserves reserves 5
## states states 5
## study study 5
## united united 5
## way way 5
## week week 5
## west west 5
## year year 5
## 150 150 4
## according according 4
## added added 4
## among among 4
## asked asked 4
## brings brings 4
## buyers buyers 4
## can can 4
## change change 4
## corp corp 4
## cut cut 4
## day day 4
## development development 4
## effective effective 4
## exploration exploration 4
## fall fall 4
## february february 4
## fell fell 4
## free free 4
## higher higher 4
## increase increase 4
## levels levels 4
## meet meet 4
## must must 4
## never never 4
## per per 4
## policy policy 4
## power power 4
## problem problem 4
## producing producing 4
## protect protect 4
## pumping pumping 4
## reduced reduced 4
## revenue revenue 4
## revenues revenues 4
## rise rise 4
## saying saying 4
## set set 4
## since since 4
## spa spa 4
## strategic strategic 4
## texas texas 4
## this this 4
## total total 4
## transaction transaction 4
## weak weak 4
## yesterday yesterday 4
## 13nation 13nation 3
## 1985 1985 3
## 198586 198586 3
## 1986 1986 3
## 1987 1987 3
## 198788 198788 3
## address address 3
## agreed agreed 3
## alkhalifa alkhalifa 3
## alqabas alqabas 3
## alsabah alsabah 3
## announced announced 3
## appears appears 3
## arabian arabian 3
## arabias arabias 3
## around around 3
## average average 3
## back back 3
## bank bank 3
## bbl bbl 3
## boost boost 3
## called called 3
## commitment commitment 3
## companies companies 3
## compared compared 3
## countrys countrys 3
## current current 3
## decrease decrease 3
## denied denied 3
## dollars dollars 3
## domestic domestic 3
## due due 3
## efp efp 3
## estimate estimate 3
## estimates estimates 3
## export export 3
## fixed fixed 3
## foreign foreign 3
## four four 3
## future future 3
## grade grade 3
## guard guard 3
## high high 3
## hisham hisham 3
## however however 3
## inc inc 3
## indonesia indonesia 3
## indonesias indonesias 3
## intermediate intermediate 3
## kingdoms kingdoms 3
## kuwaits kuwaits 3
## local local 3
## low low 3
## lower lower 3
## lowered lowered 3
## made made 3
## main main 3
## march march 3
## mckiernan mckiernan 3
## measures measures 3
## mizrahi mizrahi 3
## moves moves 3
## much much 3
## named named 3
## next next 3
## officials officials 3
## plant plant 3
## position position 3
## president president 3
## pressure pressure 3
## producer producer 3
## projected projected 3
## published published 3
## real real 3
## refinery refinery 3
## reiterated reiterated 3
## remain remain 3
## rule rule 3
## say say 3
## sector sector 3
## several several 3
## sharp sharp 3
## ship ship 3
## six six 3
## slightly slightly 3
## smaller smaller 3
## spokeswoman spokeswoman 3
## state state 3
## sweet sweet 3
## texaco texaco 3
## three three 3
## trading trading 3
## two two 3
## weeks weeks 3
## york york 3
## 1518 1518 2
## 198687 198687 2
## 20s 20s 2
## 285000 285000 2
## 500 500 2
## 725 725 2
## 750 750 2
## 948000 948000 2
## activity activity 2
## adhering adhering 2
## agriculture agriculture 2
## along along 2
## apparently apparently 2
## appeared appeared 2
## approved approved 2
## architect architect 2
## areas areas 2
## aspen aspen 2
## bankers bankers 2
## based based 2
## benchmark benchmark 2
## benefits benefits 2
## canada canada 2
## canadian canadian 2
## cash cash 2
## changed changed 2
## changes changes 2
## circumstance circumstance 2
## clearly clearly 2
## closed closed 2
## coast coast 2
## come come 2
## committee committee 2
## commodity commodity 2
## consumption consumption 2
## countries countries 2
## country country 2
## crossroads crossroads 2
## crucial crucial 2
## cubic cubic 2
## currently currently 2
## customers customers 2
## deal deal 2
## decembers decembers 2
## decline decline 2
## deficit deficit 2
## deposits deposits 2
## deputy deputy 2
## diamond diamond 2
## differentials differentials 2
## difficulties difficulties 2
## direction direction 2
## discounted discounted 2
## dlr dlr 2
## early early 2
## ecuador ecuador 2
## effect effect 2
## embargo embargo 2
## embassy embassy 2
## entering entering 2
## even even 2
## falling falling 2
## feb feb 2
## first first 2
## fiscal fiscal 2
## fiscales fiscales 2
## forced forced 2
## fully fully 2
## gcc gcc 2
## general general 2
## given given 2
## grades grades 2
## halt halt 2
## hit hit 2
## hope hope 2
## humanistic humanistic 2
## impact impact 2
## import import 2
## increased increased 2
## increasing increasing 2
## instead instead 2
## institute institute 2
## interview interview 2
## investment investment 2
## jersey jersey 2
## june june 2
## keep keep 2
## late late 2
## light light 2
## limit limit 2
## limits limits 2
## line line 2
## lost lost 2
## louisiana louisiana 2
## lowest lowest 2
## major major 2
## marketing marketing 2
## member member 2
## mid mid 2
## million million 2
## ministers ministers 2
## mitigate mitigate 2
## months months 2
## nearing nearing 2
## net net 2
## neutral neutral 2
## none none 2
## nuclear nuclear 2
## organisation organisation 2
## organization organization 2
## outlook outlook 2
## overseas overseas 2
## pact pact 2
## pay pay 2
## petroliferos petroliferos 2
## planned planned 2
## port port 2
## positive positive 2
## postings postings 2
## predicted predicted 2
## press press 2
## pricing pricing 2
## private private 2
## probably probably 2
## problems problems 2
## pronounced pronounced 2
## protected protected 2
## public public 2
## put put 2
## quotes quotes 2
## raise raise 2
## rate rate 2
## reduction reduction 2
## referring referring 2
## remarks remarks 2
## reports reports 2
## return return 2
## review review 2
## risks risks 2
## riyal riyal 2
## selfimposed selfimposed 2
## selling selling 2
## share share 2
## shortfall shortfall 2
## spot spot 2
## steady steady 2
## stick stick 2
## strongly strongly 2
## studies studies 2
## support support 2
## take take 2
## taken taken 2
## techniques techniques 2
## throughput throughput 2
## trade trade 2
## trust trust 2
## trying trying 2
## uncertainty uncertainty 2
## union union 2
## value value 2
## wam wam 2
## wanted wanted 2
## weakness weakness 2
## winter winter 2
## yacimientos yacimientos 2
## yanbu yanbu 2
## zero zero 2
## zone zone 2
## 100000 100000 1
## 108 108 1
## 111 111 1
## 115 115 1
## 12217 12217 1
## 1232 1232 1
## 1381 1381 1
## 13member 13member 1
## 156 156 1
## 1600 1600 1
## 1635 1635 1
## 1650 1650 1
## 1667 1667 1
## 168 168 1
## 1685 1685 1
## 1752 1752 1
## 180000 180000 1
## 200000 200000 1
## 200foot 200foot 1
## 2226 2226 1
## 24hour 24hour 1
## 2766 2766 1
## 300 300 1
## 3749598 3749598 1
## 3750003 3750003 1
## 4133 4133 1
## 534 534 1
## 5472 5472 1
## 614 614 1
## 658 658 1
## 6745 6745 1
## 678 678 1
## 718 718 1
## 738 738 1
## able able 1
## abroad abroad 1
## accept accept 1
## across across 1
## add add 1
## addressed addressed 1
## adherence adherence 1
## advantage advantage 1
## advisers advisers 1
## after after 1
## agricultural agricultural 1
## aground aground 1
## allocated allocated 1
## allocations allocations 1
## allow allow 1
## almost almost 1
## already already 1
## althani althani 1
## although although 1
## alvite alvite 1
## amidst amidst 1
## analysis analysis 1
## analyst analyst 1
## annual annual 1
## anything anything 1
## apparent apparent 1
## aramco aramco 1
## argentine argentine 1
## arrangement arrangement 1
## asia asia 1
## asian asian 1
## assesses assesses 1
## assign assign 1
## assigned assigned 1
## associates associates 1
## attract attract 1
## available available 1
## averaging averaging 1
## aware aware 1
## bahrain bahrain 1
## bahrains bahrains 1
## balance balance 1
## baseless baseless 1
## basic basic 1
## basis basis 1
## because because 1
## beginning beginning 1
## bijan bijan 1
## bin bin 1
## bit bit 1
## bodys bodys 1
## briefly briefly 1
## broadened broadened 1
## brothers brothers 1
## buildings buildings 1
## burden burden 1
## buy buy 1
## buyer buyer 1
## calendar calendar 1
## cambridge cambridge 1
## capacity capacity 1
## capozza capozza 1
## carrying carrying 1
## center center 1
## century century 1
## cera cera 1
## certain certain 1
## cftc cftc 1
## chairman chairman 1
## challenge challenge 1
## characterized characterized 1
## charging charging 1
## cheap cheap 1
## cheating cheating 1
## chevron chevron 1
## chv chv 1
## cited cited 1
## citing citing 1
## clever clever 1
## close close 1
## closer closer 1
## closes closes 1
## coming coming 1
## commission commission 1
## communications communications 1
## companys companys 1
## completed completed 1
## complex complex 1
## condition condition 1
## conditions conditions 1
## considered considered 1
## construction construction 1
## contacts contacts 1
## continuation continuation 1
## continue continue 1
## continued continued 1
## continues continues 1
## contracted contracted 1
## contributed contributed 1
## control control 1
## cooperation cooperation 1
## coordination coordination 1
## copany copany 1
## corps corps 1
## council council 1
## counter counter 1
## coupled coupled 1
## covered covered 1
## creek creek 1
## critical critical 1
## cts cts 1
## currency currency 1
## custom custom 1
## cuts cuts 1
## cutting cutting 1
## cypriot cypriot 1
## daniel daniel 1
## david david 1
## days days 1
## debtburdened debtburdened 1
## debut debut 1
## decided decided 1
## declared declared 1
## declines declines 1
## deemed deemed 1
## defence defence 1
## delaware delaware 1
## delivered delivered 1
## delivering delivering 1
## departments departments 1
## deregulate deregulate 1
## deregulation deregulation 1
## determination determination 1
## devalue devalue 1
## device device 1
## differential differential 1
## difficulty difficulty 1
## dillard dillard 1
## director director 1
## discuss discuss 1
## discussing discussing 1
## distribution distribution 1
## distributions distributions 1
## divided divided 1
## doha doha 1
## dollar dollar 1
## drawbacks drawbacks 1
## drop drop 1
## dropped dropped 1
## earlier earlier 1
## earnings earnings 1
## eastern eastern 1
## easy easy 1
## ecuadors ecuadors 1
## editor editor 1
## edmontonswann edmontonswann 1
## education education 1
## eight eight 1
## either either 1
## elaborate elaborate 1
## elections elections 1
## electricity electricity 1
## end end 1
## engineers engineers 1
## entitlements entitlements 1
## environment environment 1
## equally equally 1
## estimated estimated 1
## european european 1
## exceed exceed 1
## exceeding exceeding 1
## excess excess 1
## excesses excesses 1
## excessive excessive 1
## exerted exerted 1
## exist exist 1
## expanded expanded 1
## expansion expansion 1
## expartners expartners 1
## expectations expectations 1
## expects expects 1
## explained explained 1
## exporting exporting 1
## exxon exxon 1
## face face 1
## faced faced 1
## faces faces 1
## facilities facilities 1
## facing facing 1
## failed failed 1
## fallen fallen 1
## favours favours 1
## fee fee 1
## fernando fernando 1
## figure figure 1
## figures figures 1
## finance finance 1
## firmer firmer 1
## floating floating 1
## followed followed 1
## for for 1
## forces forces 1
## foremost foremost 1
## fourth fourth 1
## frank frank 1
## full full 1
## fundamentals fundamentals 1
## gas gas 1
## generally generally 1
## geneva geneva 1
## get get 1
## globalization globalization 1
## glut glut 1
## guaranteed guaranteed 1
## halting halting 1
## harvard harvard 1
## heads heads 1
## health health 1
## hedge hedge 1
## hedged hedged 1
## helped helped 1
## hemisphere hemisphere 1
## highly highly 1
## hills hills 1
## hitting hitting 1
## hoped hoped 1
## housing housing 1
## houston houston 1
## immediately immediately 1
## implementation implementation 1
## improve improve 1
## improvement improvement 1
## include include 1
## including including 1
## independent independent 1
## indications indications 1
## initiate initiate 1
## initiative initiative 1
## institutions institutions 1
## interbank interbank 1
## interest interest 1
## investments investments 1
## issue issue 1
## jamaica jamaica 1
## jan jan 1
## juaymah juaymah 1
## jubail jubail 1
## jump jump 1
## just just 1
## khalifa khalifa 1
## lack lack 1
## largest largest 1
## later later 1
## latest latest 1
## launched launched 1
## lead lead 1
## leading leading 1
## learn learn 1
## least least 1
## lending lending 1
## less less 1
## lesson lesson 1
## level level 1
## liberalised liberalised 1
## lift lift 1
## liftings liftings 1
## like like 1
## lines lines 1
## liquidity liquidity 1
## little little 1
## loan loan 1
## lodged lodged 1
## longterm longterm 1
## ltd ltd 1
## lukman lukman 1
## lull lull 1
## maintain maintain 1
## make make 1
## manager manager 1
## manipulate manipulate 1
## many many 1
## marathon marathon 1
## marathons marathons 1
## marker marker 1
## mcfadden mcfadden 1
## means means 1
## mercantile mercantile 1
## metrers metrers 1
## metres metres 1
## mid1960s mid1960s 1
## mid1986 mid1986 1
## mideast mideast 1
## minus minus 1
## mlotok mlotok 1
## mob mob 1
## mobil mobil 1
## momentum momentum 1
## money money 1
## monopolies monopolies 1
## monthend monthend 1
## moussavarrahmani moussavarrahmani 1
## movement movement 1
## nation nation 1
## natural natural 1
## need need 1
## needs needs 1
## negative negative 1
## negotiate negotiate 1
## neither neither 1
## network network 1
## news news 1
## newspaper newspaper 1
## nigerian nigerian 1
## nine nine 1
## nonoil nonoil 1
## northern northern 1
## notes notes 1
## offset offset 1
## onetwelfth onetwelfth 1
## oneweek oneweek 1
## open open 1
## opens opens 1
## operations operations 1
## opposite opposite 1
## optimism optimism 1
## optimistic optimistic 1
## option option 1
## order order 1
## organiaation organiaation 1
## our our 1
## outlining outlining 1
## outside outside 1
## overproducing overproducing 1
## part part 1
## parties parties 1
## partly partly 1
## party party 1
## past past 1
## paths paths 1
## paul paul 1
## paulsboro paulsboro 1
## payments payments 1
## pegged pegged 1
## performance performance 1
## period period 1
## pertains pertains 1
## pessimistic pessimistic 1
## philadelphia philadelphia 1
## physical physical 1
## placed placed 1
## plastics plastics 1
## platinum platinum 1
## point point 1
## policies policies 1
## political political 1
## population population 1
## positions positions 1
## postponed postponed 1
## pressures pressures 1
## previous previous 1
## primarily primarily 1
## primary primary 1
## principal principal 1
## procedure procedure 1
## produce produce 1
## produced produced 1
## producers producers 1
## product product 1
## products products 1
## program program 1
## projection projection 1
## projects projects 1
## prompted prompted 1
## proposed proposed 1
## proved proved 1
## providing providing 1
## provision provision 1
## publish publish 1
## purposes purposes 1
## quarter quarter 1
## quiet quiet 1
## quietly quietly 1
## quotas quotas 1
## rallied rallied 1
## ran ran 1
## ranging ranging 1
## ras ras 1
## rates rates 1
## rationalise rationalise 1
## readdress readdress 1
## reaffirmed reaffirmed 1
## reasonable reasonable 1
## recommending recommending 1
## recovering recovering 1
## recurrent recurrent 1
## reference reference 1
## refineries refineries 1
## refining refining 1
## reflect reflect 1
## refloat refloat 1
## reform reform 1
## reforms reforms 1
## regain regain 1
## regard regard 1
## regarding regarding 1
## region region 1
## regional regional 1
## reiterate reiterate 1
## relaxation relaxation 1
## relieve relieve 1
## reluctant reluctant 1
## remainder remainder 1
## remained remained 1
## reported reported 1
## request request 1
## resistance resistance 1
## resisting resisting 1
## resources resources 1
## respectively respectively 1
## responsibilites responsibilites 1
## restored restored 1
## restraint restraint 1
## restrictions restrictions 1
## result result 1
## reuters reuters 1
## revealed revealed 1
## reviews reviews 1
## right right 1
## rilwanu rilwanu 1
## rising rising 1
## river river 1
## rocks rocks 1
## rosemary rosemary 1
## rumour rumour 1
## rushing rushing 1
## sales sales 1
## salomon salomon 1
## santos santos 1
## satisfied satisfied 1
## saw saw 1
## scheduled scheduled 1
## scheme scheme 1
## seapride seapride 1
## season season 1
## secretary secretary 1
## security security 1
## seeing seeing 1
## seek seek 1
## september september 1
## series series 1
## serve serve 1
## services services 1
## session session 1
## sets sets 1
## seven seven 1
## severely severely 1
## shamrock shamrock 1
## sharply sharply 1
## shoulder shoulder 1
## show show 1
## showed showed 1
## shown shown 1
## signed signed 1
## signs signs 1
## situation situation 1
## sixmonth sixmonth 1
## slackens slackens 1
## slide slide 1
## slump slump 1
## social social 1
## sold sold 1
## soon soon 1
## sort sort 1
## sour sour 1
## south south 1
## southeast southeast 1
## spend spend 1
## spill spill 1
## spoke spoke 1
## spokesman spokesman 1
## spotnext spotnext 1
## spriggs spriggs 1
## stabilise stabilise 1
## stabilize stabilize 1
## stable stable 1
## start start 1
## statement statement 1
## steel steel 1
## steering steering 1
## steps steps 1
## stiff stiff 1
## storage storage 1
## strong strong 1
## subsequently subsequently 1
## substitution substitution 1
## succeed succeed 1
## suffer suffer 1
## suffering suffering 1
## suhartos suhartos 1
## sunday sunday 1
## sundays sundays 1
## supply supply 1
## supporting supporting 1
## suppose suppose 1
## surrounding surrounding 1
## swift swift 1
## talks talks 1
## tanker tanker 1
## tanurah tanurah 1
## tapers tapers 1
## teach teach 1
## telephone telephone 1
## terminals terminals 1
## test test 1
## there there 1
## third third 1
## thomas thomas 1
## though though 1
## thought thought 1
## threemonth threemonth 1
## tide tide 1
## together together 1
## told told 1
## totalled totalled 1
## tower tower 1
## trader trader 1
## trades trades 1
## traditional traditional 1
## traditionally traditionally 1
## transacted transacted 1
## transmission transmission 1
## transport transport 1
## trends trends 1
## trusts trusts 1
## try try 1
## turmoil turmoil 1
## twofold twofold 1
## uae uae 1
## uncertain uncertain 1
## unchanged unchanged 1
## under under 1
## unions unions 1
## unitholders unitholders 1
## universitys universitys 1
## unlikely unlikely 1
## unocal unocal 1
## unusually unusually 1
## urged urged 1
## use use 1
## virtual virtual 1
## wants wants 1
## water water 1
## wealth wealth 1
## wednesday wednesday 1
## weekend weekend 1
## welcomed welcomed 1
## when when 1
## whether whether 1
## wishes wishes 1
## worldwide worldwide 1
## xon xon 1
## yergin yergin 1
## yesterdays yesterdays 1
## word freq
## oil oil 85
## said said 73
## prices prices 48
## opec opec 42
## mln mln 31
## the the 26
## last last 24
## bpd bpd 23
## dlrs dlrs 23
## crude crude 21
26.3 Paso 1
Importar los datos de la web
filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"
text <- readLines(filePath)
text
## [1] ""
## [2] "And so even though we face the difficulties of today and tomorrow, I still have a dream. It is a dream deeply rooted in the American dream."
## [3] " "
## [4] "I have a dream that one day this nation will rise up and live out the true meaning of its creed:"
## [5] " "
## [6] "We hold these truths to be self-evident, that all men are created equal."
## [7] " "
## [8] "I have a dream that one day on the red hills of Georgia, the sons of former slaves and the sons of former slave owners will be able to sit down together at the table of brotherhood."
## [9] " "
## [10] "I have a dream that one day even the state of Mississippi, a state sweltering with the heat of injustice, sweltering with the heat of oppression, will be transformed into an oasis of freedom and justice."
## [11] " "
## [12] "I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character."
## [13] " "
## [14] "I have a dream today!"
## [15] " "
## [16] "I have a dream that one day, down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification, one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers."
## [17] " "
## [18] "I have a dream today!"
## [19] " "
## [20] "I have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight; and the glory of the Lord shall be revealed and all flesh shall see it together."
## [21] " "
## [22] "This is our hope, and this is the faith that I go back to the South with."
## [23] " "
## [24] "With this faith, we will be able to hew out of the mountain of despair a stone of hope. With this faith, we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith, we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day."
## [25] " "
## [26] "And this will be the day, this will be the day when all of God s children will be able to sing with new meaning:"
## [27] " "
## [28] "My country tis of thee, sweet land of liberty, of thee I sing."
## [29] "Land where my fathers died, land of the Pilgrim s pride,"
## [30] "From every mountainside, let freedom ring!"
## [31] "And if America is to be a great nation, this must become true."
## [32] "And so let freedom ring from the prodigious hilltops of New Hampshire."
## [33] "Let freedom ring from the mighty mountains of New York."
## [34] "Let freedom ring from the heightening Alleghenies of Pennsylvania."
## [35] "Let freedom ring from the snow-capped Rockies of Colorado."
## [36] "Let freedom ring from the curvaceous slopes of California."
## [37] " "
## [38] "But not only that:"
## [39] "Let freedom ring from Stone Mountain of Georgia."
## [40] "Let freedom ring from Lookout Mountain of Tennessee."
## [41] "Let freedom ring from every hill and molehill of Mississippi."
## [42] "From every mountainside, let freedom ring."
## [43] "And when this happens, when we allow freedom ring, when we let it ring from every village and every hamlet, from every state and every city, we will be able to speed up that day when all of God s children, black men and white men, Jews and Gentiles, Protestants and Catholics, will be able to join hands and sing in the words of the old Negro spiritual:"
## [44] "Free at last! Free at last!"
## [45] " "
## [46] "Thank God Almighty, we are free at last!"
Importar un texto de su computadora en formato .txt No va a funcionar el formato .doc de MSWord.
26.4 Subir el texto en formato Corpus
mi_texto=iconv(text,"WINDOWS-1252","UTF-8") # Use this for removing accents and non - english characters
# Load the data as a corpus
docs <- Corpus(VectorSource(mi_texto))
docs
## <<SimpleCorpus>>
## Metadata: corpus specific: 1, document level (indexed): 0
## Content: documents: 46
26.6 Transformar el texto para reemplazar algunos caracteres especiales, y remplazarlos por espacio en blanco
26.7 el paquete tm es para text mining
# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
stopwords("english") # list of common english stopwords that are often removed
## [1] "i" "me" "my" "myself" "we"
## [6] "our" "ours" "ourselves" "you" "your"
## [11] "yours" "yourself" "yourselves" "he" "him"
## [16] "his" "himself" "she" "her" "hers"
## [21] "herself" "it" "its" "itself" "they"
## [26] "them" "their" "theirs" "themselves" "what"
## [31] "which" "who" "whom" "this" "that"
## [36] "these" "those" "am" "is" "are"
## [41] "was" "were" "be" "been" "being"
## [46] "have" "has" "had" "having" "do"
## [51] "does" "did" "doing" "would" "should"
## [56] "could" "ought" "i'm" "you're" "he's"
## [61] "she's" "it's" "we're" "they're" "i've"
## [66] "you've" "we've" "they've" "i'd" "you'd"
## [71] "he'd" "she'd" "we'd" "they'd" "i'll"
## [76] "you'll" "he'll" "she'll" "we'll" "they'll"
## [81] "isn't" "aren't" "wasn't" "weren't" "hasn't"
## [86] "haven't" "hadn't" "doesn't" "don't" "didn't"
## [91] "won't" "wouldn't" "shan't" "shouldn't" "can't"
## [96] "cannot" "couldn't" "mustn't" "let's" "that's"
## [101] "who's" "what's" "here's" "there's" "when's"
## [106] "where's" "why's" "how's" "a" "an"
## [111] "the" "and" "but" "if" "or"
## [116] "because" "as" "until" "while" "of"
## [121] "at" "by" "for" "with" "about"
## [126] "against" "between" "into" "through" "during"
## [131] "before" "after" "above" "below" "to"
## [136] "from" "up" "down" "in" "out"
## [141] "on" "off" "over" "under" "again"
## [146] "further" "then" "once" "here" "there"
## [151] "when" "where" "why" "how" "all"
## [156] "any" "both" "each" "few" "more"
## [161] "most" "other" "some" "such" "no"
## [166] "nor" "not" "only" "own" "same"
## [171] "so" "than" "too" "very"
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Text stemming
# docs <- tm_map(docs, stemDocument)
#stopwords() # Here are all the stopwords in the function **stopwords**
26.8 Crear una matriz de las palabras de mi documento
dtm <- TermDocumentMatrix(docs) # convirtir el texto en una lista de palabras
m <- as.matrix(dtm) # convertir en una matriz
v <- sort(rowSums(m),decreasing=TRUE) # ordenar las palabras por frecuencia
d <- data.frame(word = names(v),freq=v) # crea un nuevo data frame de las palabras y su frecuencia
head(d, n=10) # las primeras 10 palabras más comunes
## word freq
## will will 17
## freedom freedom 13
## ring ring 12
## dream dream 11
## day day 11
## let let 11
## every every 9
## one one 8
## able able 8
## together together 7
26.8.2 How would you remove from the data frame all words that have less or equal to 3 counts
Del paquete wordcloud
#head(d)
wordcloud(words = d$word, # las palabras
freq = d$freq, # la frecuencia
min.freq = 10, # la frecuencia mínima
max.words=200, # el número máximo de palabras
random.order=TRUE, # orden aleatorio
rot.per=0.35, # rotación de las palabras
colors=brewer.pal(8, "Dark2")) # colores
26.10 Como remover palabras de otra idioma
26.10.1 Vea este enlace para los “stopwords” de muchos idiomas
https://cran.r-project.org/web/packages/stopwords/readme/README.html
26.11 En español
# from CRAN
install.packages("stopwords")
# Or get the development version from GitHub:
# install.packages("devtools")
# devtools::install_github("quanteda/stopwords")
26.11.1 Las 30 primeras palabras en la lista de stopword del paquete “stopwords” en español
## [1] "de" "la" "que" "el" "en" "y" "a" "los"
## [9] "del" "se" "las" "por" "un" "para" "con" "no"
## [17] "una" "su" "al" "lo" "como" "más" "pero" "sus"
## [25] "le" "ya" "o" "este" "sí" "porque"
Ejemplos #3
Aqui un tercer ejemplo
# install.packages("pacman") # Si no tiene instalada la Biblioteca Pacman ejecutar esta línea de código
library("pacman")
p_load("tm") # Biblioteca para realizar el preprocesado del texto,
p_load("tidyverse") # Biblioteca con funciones para manipular datos.
p_load("wordcloud") # Biblioteca para graficar nuestra nube de palabras.
p_load("RColorBrewer") # Biblioteca para seleccionar una paleta de colores de nuestra nube de palabras.
26.13 Convertir su documento en Corpus y identificar que es en español
texto2 <- VCorpus(VectorSource(texto),
readerControl = list(reader = readPlain, language = "es", load=TRUE))
texto2
## <<VCorpus>>
## Metadata: corpus specific: 0, document level (indexed): 0
## Content: documents: 1
26.14 Limpieza del documento
- remover los números
- remover las puntuaciones
- cambiar a letras minúsculas
- remover las palabras comunes en español
- usar solamente la base de las palabres (“stem”) por ejemplo remover las conjugaciones espero, esparas, espera, esperamos… se convierte en “esper”
- remover espacios blancos
texto2 <- tm_map(texto2, removeNumbers)
texto2 <- tm_map(texto2, removePunctuation)
texto2 <- tm_map(texto2, tolower)
texto2 <- tm_map(texto2, removeWords, stopwords::stopwords("es", source = "snowball"))
#texto2 <- tm_map(texto2, stemDocument, language="spanish")
texto2 <- tm_map(texto2, stripWhitespace)
26.18 Calcular la frecuencia de de cada palabra
tabla_frecuencia <- cbind(palabras = tabla_frecuencia$dimnames$Terms,
frecuencia = tabla_frecuencia$v)
# Convertimos los valores enlazados con cbind a un objeto dataframe.
tabla_frecuencia<-as.data.frame(tabla_frecuencia)
# Forzamos a que la columna de frecuencia contenga valores numéricos.
tabla_frecuencia$frecuencia<-as.numeric(tabla_frecuencia$frecuencia)
# Ordenamos muestra tabla de frecuencias de acuerdo a sus valores numéricos.
tabla_frecuencia<-tabla_frecuencia[order(tabla_frecuencia$frecuencia, decreasing=TRUE),]
head(tabla_frecuencia) # aqui vemos las 6 palabras más comunes en el texto
## palabras frecuencia
## 809 inteligencia 79
## 126 artificial 67
## 1378 sistemas 32
## 1347 ser 22
## 737 humano 19
## 1172 problemas 18
wordcloud(words = tabla_frecuencia$palabras,
freq = tabla_frecuencia$frecuencia,
min.freq = 5,
max.words = 100,
random.order = FALSE,
colors = brewer.pal(8,"Paired"))
Como ver la figuras en la pestaña “Plots”
De esa forma puede bajar los WordClouds como .pdf o otro formato.
<https://stackoverflow.com/questions/40570621/rstudio-how-to-show-plot-output-in-bottom-right-pane>