8.2. Text Mining

In this section, we will learn how to read and explore data on a deeper level. We will learn how to use various tools to group and visualize the diverse text in a data set. Keep in mind that you might encounter some errors along the way. Therefore, we will also focus on different ways we can fix these errors.

In this chapter we are going to explore the text from many speeches over many years given by delegates to the United Nations General Assembly. Every year since 1947, representatives of UN member states gather at the annual sessions of the United Nations General Assembly. The centrepiece of each session is the General Debate. This is a forum at which leaders and other senior officials deliver statements that present their government’s perspective on the major issues in world politics. (Jankin Mikhaylov, Slava; Baturo, Alexander; Dasandi, Niheer, 2017, “United Nations General Debate Corpus”, https://doi.org/10.7910/DVN/0TJX8Y, Harvard Dataverse, V5) This diversity of text gives us a great opportunity for some deeper exploration of text. You can download un-general-debates.csv to get started. It is 130MB so be prepared for a lot of words! It is licensed CC0 which means it is a public domain data set. We will also make use of a CSV file of country codes.

For your convenience you can use the Colab notebook TextAnalysisOnUNGeneralDebates.ipynb

8.2.1. Reading List

8.2.2. Exploratory Questions

  1. How many speeches were given each year?

  2. Make a bar graph of the number of speeches each year.

  3. Which country and what year has given the longest speech (by number of words)?

  4. Which country has spoken the most times?

  5. Which country has spoken the least times?

  6. Make a graph to illustrate the distribution of the number of times each country has spoken.

  7. What were the 25 most commonly used words in the 1970 session?

  8. What were the 25 most commonly used words in the 2015 session?

%matplotlib inline

import string
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import psycopg2
import textatistic
import seaborn as sbn
from altair import Chart, X, Y, Color, Scale
import altair as alt
from vega_datasets import data
import requests
from bs4 import BeautifulSoup
import nltk
from nltk.corpus import stopwords
matplotlib.style.use('ggplot')
undf = pd.read_csv('Data/un-general-debates.csv')
len(undf)
7507
undf.sort_values('year', ascending=False).head()
session year country text
5543 70 2015 KNA It is indeed an honour for me to address this ...
5524 70 2015 KOR I would first like to congratulate the United ...
5522 70 2015 ARE It is my pleasure to congratulate Mr. Mogens L...
5521 70 2015 BHS I bring you, Sir, and the Assembly warm frater...
5520 70 2015 ISL The world has changed enormously since the est...

As you can see, the data frame is relatively simple. It consists of a session number, year, country code and text. Most of the text is hidden because Pandas doesn’t want to make the notebook with unreadably long text strings. In reality, most of the texts are very lengthy! If you want to see the entire string, you can update Jupyter’s max_colwidth option to see it all. You’ll probably want to set it back because, as you can see in the following example, looking at the entire thing is largely useless.

pd.set_option('display.max_colwidth', -1)
print(undf[(undf.year == 1970) & (undf.country == 'USA')].text)
pd.set_option('display.max_colwidth', 50)
161    1.t It is my privilege to extend to you once again the warm
congratulations of the United States delegation on your election to the
Presidency of the twenty-fifth session of the General Assembly. Your country
has few peers in the contributions it has made to the work of this
Organization. Your father was a leader in the League of Nations, and now, in
this session of the General Assembly, you are adding a new chapter in the
family tradition. In the United Nations community we have long known you as
an able spokesman of your country, as a participant in the life of the
United Nations from its very beginning and an authority on the Charter, as a
leading scholar and practitioner in international law, and as one whose
personal outlook is strongly humanitarian and imbued with hope and
determination for a better future. We are indeed fortunate to have you to
lead us in this session devoted to the strengthening and improvement of the
United Nations.n2.tDuring this session we shall commemorate with fitting
ceremony one of the most creative and hopeful events of this century, the
founding of the United Nations twenty-five years ago. It is right that this
commemoration should be held. It is far more important, however, that the
nations gathered at this session should so act together as to strengthen the
United Nations for the future, to ensure that we are closer to 'peace,
justice and progress' five, ten and twenty- five years hence than we are
today. The world is unlikely to care much what we say about the past. It
will care a great deal what we do about the future.n3.tThe problems the
world faces today are no less grave than those it faced twenty-five years
ago. Indeed they are in many respects far graver. Unless and until the
Members of the United Nations demonstrate their willingness to deal
effectively with these problems, the confidence of the world's peoples in
the United Nations will remain uncertain and precarious.n4.tIn the view of
the United States, the most critical international problem today is the same
as that which faced the founders of the United Nations: the need to build a
reliable framework for peace and security among nations. As President Nixon
said in this Assembly a year ago: '... there is no nobler destiny, nor any
greater gift that one age could make to the ages that follow, than to forge
the key to a lasting peace'. [1755th meeting, para. 45.]n5.tIn this
conviction the United States has sought in the last two years to frame
foreign policies which would help to make the international scene less
destructively competitive, less strident, more cooperative and orderly. We
have sought, in the spirit of the Charter, to pursue peaceful settlement of
all conflicts and confrontations, and we have urged others to do
likewise.n6.tThe most significant of these efforts has been the strategic
arms limitation discussions between my country and the Soviet Union. If at
last we can check the nuclear arms race, the world will have been helped to
move away from its gravest danger and toward new and more rational
relationships.n7.tThe United States also seeks, through a process of
serious negotiation, to ease the tensions that arise from the division of
Europe. We have welcomed the efforts of the Federal Republic of Germany to
normalize relations with its eastern neighbors; we are seeking in fourPower
talks to assure the security and wellbeing of the residents of Berlin; we
have joined our NATO allies in proposing exploratory talks on the reduction
of forces in central Europe. We hope these efforts will find the Soviet
Union and its allies forthcoming.n8.tIn VietNam, over eighteen months ago,
the United States reversed the trend in American involvement in the war. We
began a steady, uninterrupted reduction in the number of our troops. We have
made a negotiated peace our highest priority. Although North VietNam has as
yet shown no serious intention to negotiate a solution under which the South
VietNamese people can freely decide their own future, this remains our goal
and we shall not be discouraged in seeking it.nA/PV.1854n9.tMy Government
has likewise sought ways to move toward a constructive relationship with
mainland China. To this end, we have taken a number of unilateral steps to
ease tensions. We are ready at any time to resume the talks in
Warsaw.n10.tFinally, nowhere in the world today is there more urgent need
to move from confrontation to negotiation than in the Middle East. The
legacy of bitterness, hatred and suspicion there contains the seeds of even
greater tragedy not only for the Middle East but for the world, unless there
is early progress toward reconciliation and peace based on the legitimate
interests of all the States and peoples of the region.n11.tAs the United
Nations community well knows, the United States has made major diplomatic
efforts in recent months and weeks to help bring about peace at long last in
the Middle East. While recent developments in the area, including the tragic
death of President Nasser, have cast a shadow over the high hopes of just a
few weeks ago that a beginning toward that goal had been made, those hopes
have not been extinguished. For the most part, the guns remain silent along
the ceasefire lines. Security Council resolution 242 (1967) stands as a
beacon pointing the way to peace, and the parties concerned, in accepting
the United States initiative of June, have committed themselves to carry out
that resolution in all its parts. Ambassador Jarring is ready to resume his
mission whenever conditions permit. We earnestly hope that the parties
themselves and all others concerned will now bend every effort to the task
of building on the beginning that has already been made.n12.tThis will
require above all serious steps to restore and strengthen confidence in the
integrity of the ceasefire and military standstill agreement along the Suez
Canal. It will also require, if negotiations are to move forward, a
willingness by the parties to move from their maximum positions on the
elements of a realistic and equitable peace settlement. In this spirit, my
Government is prepared to contribute to renewed efforts toward peace in the
Middle East.n13.tIn this anniversary year the United States, like many
other Members, has given especially careful thought to means by which the
United Nations can be better enabled to fulfill its purposes.n14.tMy
delegation has listened, and will continue to listen attentively, to the
suggestions of others in this regard. In the same spirit the United States
will offer proposals, or support the proposals of others, with these
objectives among others: first, to put United Nations peacekeeping
operations on a firmer and more reliable basis; second, to make the United
Nations more effective in the peaceful settlement of disputes; third, to
promote new steps in the field of disarmament; fourth, to move closer by
peaceful means to the goals of decolonization and human equality; fifth, to
support the Second Development Decade; sixth, to help slow down too rapid
rates of population growth; seventh, to promote the exploitation of the
seabeds and outer space for human benefit; eighth, to organize common action
within the United Nations to safeguard the global environment; ninth, to
encourage young people from all parts of the world to devote themselves to
international service; tenth, to improve the organization, administration
and procedures of the United Nations.n15.tIf the twenty-fifth General
Assembly were to move forward substantially in these directions, this
session would indeed rank as one of the most fruitful in United Nations
history.n16.tLet me now comment briefly on these points.n17.tWe shall
all of us on this podium and in the First Committee have much to say during
this session about strengthening international peace and security, and about
the need to live up to the Charter. This is quite proper. However, what is
needed most in this respect is not still more sweeping declarations that
merely restate the purposes and principles of the Charter. What is needed is
more effective United Nations action on the concrete issues before us, on
the Middle East, on United Nations peacekeeping procedures, on means of
pacific settlement. Until we demonstrate our willingness to take such
effective action, world opinion will continue to be skeptical, no matter
what ringing declarations we may make, as to whether the Members of the
United Nations are really serious about strengthening international peace
and security.n18.tI now come to a subject which should be at the very
center of our concern if we wish in fact to strengthen peace and security
through the United Nations. I speak of United Nations
peacekeeping.n19.tBecause the world situation in the past twenty- five
years developed differently from what the founders of the United Nations
foresaw, it has not been found possible to create forces for the maintenance
of international peace and security along the lines laid down in certain
articles of Chapter VII. We would hope that at some point the provisions of
the Charter designed for enforcement action can be implemented.n20.tIn the
meantime, however, a modus vivendi has been built up, entirely consistent
with the Charter, which has carried out significant and successful
peacekeeping operations, sometimes with observers, sometimes with
substantial forces, in the Middle East, in Kashmir, in the Congo, in Cyprus
and elsewhere. Still it is all too clear that these ad hoc and improvised
arrangements need major improvement in several respects if future
peacekeeping is to be effective.n21.tThe Assembly's Special Committee on
Peacekeeping Operations, under the able and patient leadership of Ambassador
Cuevas Cancino of Mexico, has been laboring for nearly two years at the task
of developing peacekeeping guidelines. During the past year the United
States has put forward concrete proposals for new procedures that would meet
the political concerns of all and yet allow United Nations forces to move
rapidly to carry out decisions of the Security Council. But the broad
agreement that all desire is still lacking.n22.tIt is high time however
that the General Assembly, in this twenty-fifth anniversary year, demanded
significant movement on this problem. It would be a mistake not to move at
all until all are agreed to move all the way. There is strong reason and
sufficient consensus for some forward movement at this Assembly. The United
States will be prepared to join with others in concrete proposals to this
effect.n23.tIn this connexion we welcome the statement made at Lusaka in
the Declaration of the nonaligned countries concerning the United Nations,
that measures should be taken at this session of the General Assembly to
strengthen United Nations peacekeeping, and we look forward with interest to
proposals that may be made by those present at that meeting.n24.tOf
course, peacekeeping without peaceful settlement is only an expedient,
necessary but incomplete and inconclusive. The most potent preventive of
conflict is not machinery but the will of disputing parties to show
restraint and a spirit of conciliation, and to persist in peaceful methods
until settlement is reached. This is a solemn obligation of every United
Nations Member under the Charter, and no nation deserves more honor than
those who have lived up to it in spite of every obstacle.n25.tIn this past
year such peaceful settlements, or major steps towards them, have been
recorded in a number of situations which reflect great credit on the
statesmanship of those involved. I have in mind, for example, the progress
made, with the assistance of the Organization of American States, toward
resolving the dispute between El Salvador and Honduras; and also the recent
peaceful decision on the future of Bahrain, in which good offices provided
by the Secretary General played a major role.n26.tAs these cases show,
where the will to settle exists, effective machinery can do much to help.
This is true of regional organizations, and it is no less true of the
relevant organs of the United Nations, above all the Security
Council.n27.tIn this connexion we have welcomed the Security Council's
decision [see 1544th meeting] as proposed by Finland, to hold periodic
high-level meetings under Article 28 of the Charter. My country looks
forward to taking part in the first such meeting later this fall. We welcome
likewise the valuable suggestion of Brazil committees of the Council,
including the parties to a dispute, might be created to help settle
particular disputes.n28.tTo the same end, the United States would favor
the reactivation of certain machinery for peaceful settlement which has long
been dormant. Many years ago the General Assembly created a Panel on Inquiry
and Conciliation [resolution 268 (III)], and more recently a register of
fact-finding experts. My Government will soon nominate qualified individuals
for both of these bodies. We believe fact-finders should be available ton1
Third Conference of Heads of State or Government of Non Aligned Countries,
held at Lusaka, Zambia, from 8 to 10 September 1970.nthe SecretaryGeneral
or other organs of the United Nations, or directly to Member States, to
report on the facts of situations of international concern at the request or
with the consent of one or more of the parties.n29.tFinally, it is most
important that we rejuvenate and strengthen the highest organ of judicial
settlement, the International Court of Justice. The United States recently
joined with others in the Security Council in referring to the Court for its
advisory opinion certain aspects of the Namibia situation. We have also
joined with others in submitting to this General Assembly an agenda item
calling for a review of the role of the Court. We hope this review will
suggest ways to enable the Court to make a more substantial contribution to
the establishment of the rule of law among nations.n30.tThe world has
witnessed in recent weeks shocking examples of the consequences of failure
to abide by this rule of law examples which have been alluded to by numerous
previous speakers on this podium. Hundreds of innocent individuals engaged
in peaceful international travel have been brutally seized as hostages in a
conflict in which they were in no way involved. Not only their safety and
convenience have been placed in jeopardy but their very lives. As President
Nixon pointed out in this forum a year ago, this issue 'involves the
interests of. . . every air passenger and the integrity of that structure of
order upon which a world community depends' [1755th meeting, para.
74].n31.tThat this view is widely shared is evident from the almost
universal condemnation of these most recent acts by Governments the world
over and by the resolution on this matter adopted unanimously by the
Security Council [286 (1970)]. But condemnation is not enough. It is time to
act. The Tokyo Convention, providing for the prompt return of hijacked
aircraft passengers and crew, requires the broadest international support.
The same is true of the International Civil Aviation Organization's draft
multilateral convention for the extradition and punishment of hijackers. In
addition, the United States has submitted new proposals to the ICAO, for
which I urge your most earnest consideration and support.n32.tThe General
Assembly's extensive annual debates and resolutions on many aspects of arms
control and disarmament have long played, and continue to play, an important
part in international negotiations on this most critical of all our
problems. I need only mention the partial nuclear test-ban Treaty,  the
outer space Treaty [resolution 2222 (XXI)] and the Treaty on the
NonProliferation of Nuclear Weapons [resolution 2373 (XXII)]. The
disarmament conference at Geneva has this year registered significant
progress by achieving wide consensus on a draft treaty to prohibit the
emplacement of weapons of mass destruction on a vast area of the earth's
surface the seabed beneath the oceans.  We hope this draft treaty will be
supported by the General Assembly at this session and opened for signature
shortly thereafter.n33.tI can assure the Assembly that the United States
Government is unceasingly aware of the worldwide concern and need for an end
to the arms race and the perilous burden of armaments in all its forms, and
that we shall do whatever one negotiating partner can do to bring about new
agreements on new steps toward the ultimate goal of general and complete
disarmament.n34.tPermit me to say a word about a matter of quite different
but equally wide concern that of human rights throughout the world.n35.tIn
addition to the basic responsibility of Governments to maintain human rights
for all their citizens, the United Nations under the Charter has a clear
responsibility in this regard. To strengthen the implementation of that
responsibility my Government hopes the General Assembly at this session will
create a new post, that of High Commissioner for Human Rights, to advise
United Nations organs in this field and to assist States, at their request,
with human rights problems. This proposal has been thoroughly studied and
fully justified. It deserves a high priority at the twenty-fifth session of
the General Assembly.n36.tA particular issue of human rights that is of
acute concern to my Government, among others, is the protection of the
rights of prisoners of war. While these rights have long since been
internationally guaranteed, they are in practice still denied to many
prisoners of war, as American wives and families of such prisoners know to
their great sorrow. The United States strongly hopes that this Assembly will
press for worldwide observance of the Geneva Convention relative to the
Treatment of Prisoners of War, verified by impartial inspection. It is
intolerable that the tragedy of armed conflict should be compounded by
additional inhumanities in violation of international norms.n37.tIn no
field have the aims of the United Nations found more dramatic fulfilment
during its first quarter of a century than in the field of decolonization.
Before the Second World War a third of mankind lived in colonial
dependencies. Today the proportion is less than 1 per cent. A revolution of
independence has created some sixty new nations and has been the main factor
in increasing the membership of the United Nations from 51 to
126.n38.tHowever, the situation in much of southern Africa is still
characterized by the denial of self-determination and of racial equality.
The United Nations and its member States must continue to exert peaceful
efforts to defend and advance these basic human rights of the peoples
concerned. Their cause is just and must in time prevail.n39.tAgainst these
evils the United States in recent months has taken further steps. We put
into effect last May a new policy designed to discourage United States
investment in, and trade with, Namibia so long as South Africa remains in
unlawful occupation. We urge other Governments to join us in this policy. My
Government continues strictly to observe resolutions in regard to sanctions
against the illegal regime in Rhodesia and in regard to the sale of arms to
South Africa. We have closed our consulate in Salisbury. We shall continue
in every peaceful and practicable way to pursue the ends of justice,
equality and self determination.n40.tHigh on the list of the United
Nations contributions to human progress is its longstanding effort to assist
in comprehensive global development. That effort is being continued and, we
hope, improved and strengthened through the launching of the Second United
Nations Development Decade.n41.tThe success of the Decade will depend most
of all on the policies of Member States, both developed and developing. To
bring all these policies into effective focus is the ambitious aim of the
development strategy. The United States strongly supports the strategy which
is before this Assembly [A 17982, para. 16] and intends to participate fully
in this common enterprise.n42.tI wish to leave no doubt of the serious
commitment of the United States to the Second Development Decade. President
Nixon, in a series of policy decisions and especially in his recent
announcement on the new focus of United States economic assistance, has made
clear our intention:n(a)tTo reverse the downward trend in United States
development assistance;n(b)tTo increase substantially United States
contributions to multilateral development institutions including the World
Bank, the International Development Association, the regional development
banks and the United Nations Development program so that, as the
capabilities of these institutions increase, we may be able to channel
through them most of our official development assistance;n(c)tFurther to
encourage the efforts of donor nations to 'untie' their bilateral aid to
developing countries from the obligation to import products of the donor
country;n(<d) To bring United States science and technology more
effectively to bear on the problems of development, and for this purpose to
create a new United States International Development Institute;n(<?) To
take new steps to stimulate American private investment in developing
countries within the framework of the developing countries'
plans.n43.tThese approaches are already reflected in the new policies my
country has announced toward Latin America and Africa, which stress
continued assistance, greater multilateral participation, and increased
trade and investment.n44.tFinally, bearing in mind the crucial connexion
between trade and development, the United States is pressing for a
liberalized system of generalized tariff preferences for products of
developing countries, with preferential access to the American market, and
we are urging that the developing countries receive similar access to the
markets of all developed countries.n45.tI wish to add a particular comment
on the role of the United Nations Development program. This program, in
cooperation with the specialized agencies, has done important pioneering
work in development assistance. Its machinery, however, was built for a
smaller program and must be reorganized to meet its growing
responsibilities. With the aid of last year's excellent 'capacity study' ,
the program is now preparing to put the necessary reforms into effect. In
planning our future contributions to this important program we in the United
States will give major weight to the progress actually achieved m
undertaking these reforms.n46.tIn recent years people all over the world
have suddenly awakened to the inexorable and tragic fact that excessive
population growth can, if continued much longer, frustrate all our hopes for
peace, justice and progress. There can be no progress for the majority of
mankind if population growth outstrips all available means of development.
There can be no justice for the majority of mankind where population expands
faster than production and social services. There can be no peace for the
majority of mankind where progress and justice are unattainable because of
unrestrained population growth.n47.tThis is not a problem confined to
either developed or developing countries. In my own country, although our
growth rate has recently slowed to about 1 per cent a year, we have adopted
as a national goal the availability within five years of family planning
services to every citizen.n48.tClearly, the need is equally urgent in many
nations striving for development whose annual population growth in some
cases approaches 4 per cent which means a doubling of the number of people
in less than twenty- years. The care and feeding of such enormous numbers of
dependent children, their upbringing in conditions compatible with human
dignity, could constitute such a burden as to nullify progress in economic
development and to cause living standards to remain at past low levels or
even to fall lower.n49.tThe United States is convinced that the vigorous
pursuit of family planning policies is an indispensable element in the
strategy of development. In this conviction we have pledged this year $7.5
million to the recently established United Nations Fund for Population
Activities, whose services to requesting Governments are rapidly
growing.n50.tIn the context of development I wish also to emphasize the
enormous potential of the world's deep seabeds, whose exploitation is just
now beginning to come within the reach of our technology.n51.tLast May
President Nixon, in a farreaching announcement concerning the oceans,
proposed that an international regime be established by treaty for the
exploitation of seabed resources beyond the depth of 200 meters. He further
proposed that this regime 'should provide for the collection of substantial
mineral royalties to be used for international community purposes,
particularly economic assistance to developing countries'. Early in August
the United States circulated in the United Nations seabed Committee  a draft
convention [A18021, annex V], embodying these and other important proposals
in the President's announcement.n52.tThese proposals, if carried out, will
amount to a new departure in the history of nations. Never in history has
the exploitation of resources of such great potential value been placed
under the supervision and regulation of an effective international
authority. Never in history has assurance been offered that the
international community could have a substantial, independent source of
revenue to be equitably divided to serve the interests of mankind as a
whole.n53.tThe United States hopes that the twenty-fifth session of the
General Assembly will advance this important enterprise so that a sound and
workable international seabed regime, backed up by effective machinery, can
come into being as soon as possible. To this end it is important that States
refrain from making further claims to jurisdiction over the seabeds or over
the waters of the oceans. We believe that a conference on subjects related
to the law of the sea, including seabeds, should be called as soon as
practicable and that preparatory steps should be initiated by this session
of the General Assembly.n54.tI come now to an issue of critical and
rapidly growing importance the protection of the human environment.
Development and protection of the environment are not mutually
contradictory; indeed, they must go hand in hand if the world is to be a fit
place in which to live. The United Nations is in a key position to foster
the necessary cooperation so that the needs of the environment, as well as
those of development, receive the energetic attention they
require.n55.tAlready the plans for the United Nations Conference on the
Human Environment in 1972 have begun to take shape and have helped to focus
the attention of United Nations Member Governments on this worldwide
challenge; but we should not wait for the Stockholm Conference before
launching necessary initiatives.n56.tTherefore the United States urges
that all of us here, representing both developed and developing countries,
work together to enable the United Nations to take the following steps.
Firstly, it should identify those environmental problems, especially those
pollutants in the atmosphere and the oceans, which are or may be dangerous
on a global scale. Second, it should make plans for a coordinated world
monitoring network to keep track of these environmental dangers. This
network should build on existing programs, particularly those of the
agencies of the United Nations family, and should use the most advanced data
processing and satellite technology, such as the earth resource survey
satellites which my country has been developing. Third, it should collect
and analyze the suggestions of governments concerning environmental
guidelines for States, both developed and developing, as well as for
international institutions engaged in development programs. Fourth, it
should explore the possibility of establishing international air and water
quality standards. The United States hopes the Assembly at this session will
act to advance those important objectives.n51. A newly acute problem which
threatens a growing number of societies is the epidemic spread of addiction
to dangerous drugs, especially among young people. This phenomenon has
mushroomed in a very few years, not only in my own country but in a number
of others, both developed and developing. It creates untold misery,
violence, lawlessness and economic and human loss.This menace must be
stopped. To do so it will be necessary to reinforce existing international
agreements and to strengthen the longstanding and excellent work of agencies
in this field, especially those of the United Nations. We are glad to note
that the Commission on Narcotic Drugs is meeting now to deal with the whole
range of problems involved, from the poppy field through the international
syndicate to the needle in the vein. My Government hopes that an adequate
action program will emerge from this process and will command the energetic
support of the community of nations. The United States has already offered,
subject to Congressional approval, a contribution of $2 million to such a
program.n58.tFinally, it is important that we make better use of the
talents of young people in international service, especially the service of
the United Nations. Many delegations to this session, including that of the
United States, contain youthful members as suggested by the General Assembly
a year ago. That is entirely fitting, because the fate of the United Nations
and indeed of world peace will soon be in their hands.n59.tWith that in
mind President Nixon, in his address to this Assembly a year ago [1755th
meeting], pledged the enthusiastic support of the United States for Iran's
proposal to establish an international volunteer service corps [see 1695th
meeting, para. 75], to work in the cause of development and to be recruited
on an individual basis from the people, principally young people, of many
countries. We hope that proposal will be given final approval in the current
session. We hope also that the United Nations will seek new ways to
encourage able young people to find careers in its Secretariat and those of
the other agencies of the United Nations family.n60.tIn these remarks I
have discussed only a few of the major tasks facing the United Nations in
the years ahead. Even those, however, are enough tc make it obvious that, if
we indeed address ourselves to such tasks, the effectiveness of this
Organization will be tested more severely than ever. To meet this test we
shall have to be far more attentive than has been our habit to many matters
of structure, organization and procedure. For example, I would mention the
following.n61.tIn considering applications for membership by very small
States, we must make sure that they are not only willing but also, as the
Charter stipulates, able to carry out the Charter's obligations. As the
SecretaryGeneral has for years pointed out, many Territories now moving
towards independence are too small, either in population or in resources or
both, to carry out the minimum obligations which membership requires. Yet
these very small entities need more than most the assistance that the United
Nations system can provide. Where the burden of membership would be
excessive, we should provide a form of association with the United Nations
which would enable such States to enjoy the benefits without the burdens of
the system.n62.tThe persistence of the United Nations financial deficit
undermines confidence in the Organization, threatens its capabilities in
many fields and casts a cloud over its future. The United States welcomes
the SecretaryGeneral's recent call for 'a concerted effort to restore the
financial solvency of the Organization'. We hope that he will himself take a
lead in such an effort, in which we shall certainly play our part.n63.tIn
the annual choice of non-permanent members of the Security Council, it would
be well that, as the Charter requires, due regard be specially paid, in the
first instance, to the prospective member's contribution to the maintenance
of international peace and security, rather than merely to rotation among
the members of geographic groups.n64.tThe primary consideration in
selecting individuals for posts in the Secretariat, above all for senior
posts, should be fully to meet the Charter's 'paramount consideration'
namely 'the highest standards of efficiency, competence, and
integrity'.n65.tWe must at long last take decisive steps to streamline the
excessively time-consuming organization and procedures of the General
Assembly, as Canada has wisely proposed, or else we shall either 'drown in a
sea of words' or suffocate under an avalanche of paper.n66.tWe must take
more effective measures to ensure that the entire United Nations system is
so organized and managed that it responds efficiently to the directives of
its governing organs and to the priority needs of the world. That will
require much better administrative and budgetary coordination and control
than we have yet achieved.n67.tThe United States offers these suggestions
in the spirit of the twenty-fifth anniversary session, which we understand
to be a spirit of sober determination to make this Organization more
effective, to make its future more responsive than its past to the
imperative needs of men, women and children everywhere. Many other
delegations have offered or will offer their proposals in the same spirit.
The test of our seriousness and our success will be how much of this agenda
of objectives we can begin to carry out.n68.tWe are assembled from the
four corners of the earth. The interests of the Governments we speak for
often seem to be and sometimes are in contradiction. But by our presence
here, by our commitment to the United Nations and its Charter, we have
acknowledged that we also have interests in common, interests in peace,
justice and progress, interests in the continued habitability of our planet,
common interests which we are at last beginning to recognize are inescapable
and overriding.n69.tThe question now is, do we have the wit not only to
perceive these common interests in some vague rhetorical way, but also to
act upon them together realistically and decisively even at the cost
sometimes of older and narrower interests? If we do not do so, history may
sweep aside not only this Organization but also the nations that compose
it.n70.tThe SecretaryGeneral has said that we may have only ten years left
to cope effectively with the problems of our times before they become so
staggering as to be beyond our capacities. As we enter the Disarmament
Decade and the Second Development Decade, let us keep that warning foremost
in our minds and let us be determined to act together to avert
catastrophe.nnnnn Name: text, dtype: object

The number of speeches each year will require us to use our new tool of grouping data. This is the split-apply-combine pattern that you may have learned about previously, but it is so commonly used in data science that Pandas makes it convenient for us.

by_year = undf.groupby('year', as_index=False)['text'].count()
by_year.head()
year text
0 1970 70
1 1971 116
2 1972 125
3 1973 120
4 1974 129
alt.Chart(by_year).mark_bar().encode(x='year:N',y='text')
Bar graph representing the change in text between 1970 to 2015.
by_country = undf.groupby('country',as_index=False)['text'].count()
by_country.head()
country text
0 AFG 45
1 AGO 38
2 ALB 46
3 AND 22
4 ARE 44
alt.Chart(by_country,title='speech distribution').mark_bar().encode(x=alt.X('text',bin=True),y='count()')
"Bar chart showing the number of records as the y axis and text(binned) as the x axis. Speech Distribution is the title of the chart."
by_country.loc[by_country.text.idxmax()]
country    ALB
text        46
Name: 2, dtype: object
by_country.loc[by_country.text.idxmin()]
country    EU
text        5
Name: 58, dtype: object

Those answers are not very satisfactory, as we can only guess as to which country ALB or EU might be. Somewhat distressingly, we see that in one case, the three digit code is used, and in another, a two digit code. We will want to augment this data using our world factbook data or the data we scraped. I have a complete table ready for you to load, so you don’t have to scrape it again.

c_codes = pd.read_csv('Data/country_codes.csv')
c_codes.head()
  ---------------------------------------------------------------------------
  UnicodeDecodeError                        Traceback (most recent call last)
  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

  pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 0: invalid continuation byte

  During handling of the above exception, another exception occurred:

  UnicodeDecodeError                        Traceback (most recent call last)
   in 
  ----> 1 c_codes = pd.read_csv('../Data/country_codes.csv')
        2 c_codes.head()

  ~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
      700                     skip_blank_lines=skip_blank_lines)
      701
  --> 702         return _read(filepath_or_buffer, kwds)
      703
      704     parser_f.__name__ = name

  ~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
      433
      434     try:
  --> 435         data = parser.read(nrows)
      436     finally:
      437         parser.close()

  ~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
    1137     def read(self, nrows=None):
    1138         nrows = _validate_integer('nrows', nrows)
  -> 1139         ret = self._engine.read(nrows)
    1140
    1141         # May alter columns / col_dict

  ~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
    1993     def read(self, nrows=None):
    1994         try:
  -> 1995             data = self._reader.read(nrows)
    1996         except StopIteration:
    1997             if self._first_chunk:

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

  pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

  pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()

  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 0: invalid continuation byte
  

8.2.3. OH NO, what the heck!!

Unicode errors can be a huge pain, but are a fact of life for anyone dealing with data from multiple sources. In this case, we can use the unix file command to get a bit more information:

$ file -I country_codes.csv
country_codes.csv: text/plain; charset=iso-8859-1

The important part of the result of that command is that it tells us that the character set is `iso-8859-1`. This piece of information is important, because it tells Python how to interpret the 8 bits as a character we would recognize. For example, let’s take the familiar copyright © symbol. This symbol is stored in the computer’s memory as 10101001. Aren’t you glad you don’t have to remember that? When Python tries to display a character for us, it has to know how that information is encoded, that is, how should Python interpret those bits. There are several common encodings used today.

  • ASCII (American Standard Code for Information Interchange): This is one of the oldest encodings and has been in use for years. Its major limitation is that it can only encode 256 characters. And in fact, Python only interprets 0-127 as proper ASCII. This was fine for American English in the early days of computing, but it does not work in the world today with many languages and many more emojis.

  • ‘utf-8’: This is probably the most common encoding in use today. UTF stands for an 8-bit Unicode Transformation Format, which can encode all 1,112,064 characters in Unicode. It can efficiently encode over 4 billion characters. Some with just 8 bits and others with up to 32 bits.

  • ‘iso-8859-1’ (also called ‘latin-1’): This encoding takes full advantage of all 8 bits of the ASCII character set. It only uses one byte, so it can only represent the first 256 Unicode characters. ISO-8859 contains encodings of different character sets, and the ‘-1’ specifies to use the ‘latin-1’ character set.

So, let’s try a little experiment. We can represent 169 as 10101001 or as the hexadecimal value a9, which is easier to work with in Python.

b'\xa9'.decode('utf8')

The above code takes the hexadecimal value of a9 and decodes it using the utf8 character set.

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-14-4c06286911b5> in <module>
----> 1 b'xa9'.decode('utf8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 0: invalid start byte

Aha! That error message looks familiar. (And you will run into this many times when working with data from the internet).

Lets give ASCII a try.

b'\xa9'.decode('ascii')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-15-1ee5bf3d809c> in <module>
----> 1 b'xa9'.decode('ascii')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa9 in position 0: ordinal not in range(128)

The message is that the character is not in range(128); yes 169 is definitely not in range(128).

Lets try it with iso-8859-1.

b'\xa9'.decode('iso-8859-1')
'©'

Success!!

Now let’s reread the file using the new character set.

c_codes = pd.read_csv('Data/country_codes.csv', encoding = 'iso-8859-1')
c_codes.head()
topics = [' nuclear', ' weapons', ' nuclear weapons', ' chemical weapons',
          ' biological weapons', ' mass destruction', ' peace', ' war',
          ' nuclear war', ' civil war', ' terror', ' genocide', ' holocaust',
          ' water', ' famine', ' disease', ' hiv', ' aids', ' malaria',
          ' cancer', ' poverty', ' human rights', ' abortion', ' refugee',
          ' immigration', ' equality', ' democracy', ' freedom',
          ' sovereignty', ' dictator', ' totalitarian', ' vote', ' energy',
          ' oil',  ' coal',  ' income', ' economy', ' growth', ' inflation',
          ' interest rate', ' security', ' cyber', ' trade', ' inequality',
          ' pollution', ' global warming', ' hunger', ' education',
          ' health', ' sanitation', ' infrastructure', ' virus',
          ' regulation', ' food', ' nutrition', ' transportation',
          ' violence', ' agriculture', ' diplomatic', ' drugs', ' obesity',
          ' islam', ' housing', ' sustainable', 'nuclear energy']
undf.head()
session year country text
0 44 1989 MDV It is indeed a pleasure for me and the member...
1 44 1989 FIN \nMay I begin by congratulating you. Sir, on ...
2 44 1989 NER \nMr. President, it is a particular pleasure ...
3 44 1989 URY \nDuring the debate at the fortieth session o...
4 44 1989 ZWE I should like at the outset to express my del...
year_summ = undf.groupby('year', as_index=False)['text'].sum()
year_summ.head()
year text
0 1970 126.\t In this anniversary year the General As...
1 1971 83.\t Mr. President, the first words of my del...
2 1972 Since the twenty-sixth session of the General ...
3 1973 1.\tIt is a great pleasure for me to congratu...
4 1974 Mr. President, first I should like to extend m...
year_summ['i'] = year_summ.text.str.count('income')
year_summ['ir'] = year_summ.text.str.count('interest rate')
year_summ
year text i ir
0 1970 126.\t In this anniversary year the General As... 37 0
1 1971 83.\t Mr. President, the first words of my del... 27 1
2 1972 Since the twenty-sixth session of the General ... 31 2
3 1973 1.\tIt is a great pleasure for me to congratu... 36 2
4 1974 Mr. President, first I should like to extend m... 57 3
5 1975 104.\t Mr. President, on behalf of the delegat... 29 1
6 1976 Allow me first to say how pleased I am to see ... 52 0
7 1977  \n1.\t'O praise the Lord, all ye nations: pr... 42 0
8 1978 210.\tI am particularly happy to be able in m... 30 1
9 1979 My delegation is pleased to convey to the rep... 45 2
10 1980 I should like first of all to extend to Ambas... 34 6
11 1981 \n73.\t Mr. President, the Republic of Iraq an... 40 32
12 1982 First of all I wish to convey my warm \ncongra... 34 51
13 1983 1.\t It is my pleasure to address, in the nam... 40 53
14 1984 I have the honour to convey to the President ... 41 106
15 1985 I wish to convey to you, Sir, the felicitation... 41 68
16 1986 Allow me first, Sir, to congratulate you on y... 34 49
17 1987 \nAllow me at the outset. Sic, to convey to y... 36 37
18 1988 \nI ask the President to accept our congratul... 47 32
19 1989 It is indeed a pleasure for me and the member... 45 32
20 1990 Mr. President, allow me to congratulate you o... 50 16
21 1991 On behalf of my delegation and on my own beha... 38 7
22 1992 I shall read out the following statement\non b... 45 6
23 1993 Allow me to congratulate you sincerely, Sir,\n... 35 0
24 1994 On behalf of the Namibian\ndelegation, I wish ... 43 7
25 1995 Allow me at the outset, on behalf of the\ndele... 28 3
26 1996 The delegation of the Republic of the Congo\n... 20 2
27 1997 I wish to congratulate the President on his\n... 27 0
28 1998 The General Assembly has\nunanimously chosen M... 31 3
29 1999 Today, we look ahead to the\nnew millennium. A... 45 2
30 2000 I join my colleagues in\ncongratulating the Pr... 54 2
31 2001 On\nbehalf of the Comorian delegation, which ... 33 1
32 2002 Allow me\nto begin my statement by expressing... 19 0
33 2003 The people of Tuvalu,\non whose behalf I have... 26 2
34 2004 The United Nations\nfaces unprecedented challe... 39 1
35 2005 Sixty years ago at San Francisco, the United\n... 61 1
36 2006 In 2006, several important anniversaries coinc... 68 2
37 2007 It is a pleasure, Sir, to congratulate you on... 65 0
38 2008 It is an \nhonour for me to represent my count... 68 0
39 2009 I begin by joining others \nin congratulating ... 83 1
40 2010 It is a privilege and a \ngreat honour for me ... 63 0
41 2011 \nAllow me, first of all, to warmly congratula... 60 0
42 2012 First, I would like\nto express my sincere ap... 80 1
43 2013 Allow me at the outset, on \nbehalf of the Pre... 91 0
44 2014 I congratulate Mr. Sam \nKutesa on his assumpt... 74 0
45 2015 The Head of State of the Transition, Her Excel... 57 1
alt.Chart(year_summ[['year', 'i', 'il']]).mark_line().encode(
    x='year',y='i')
Line graph showing mentions of income over time.
alt.Chart(year_summ[['year', 'i', 'il']].melt(
    id_vars='year', value_vars=['il','i'])).mark_line().encode(
    x='year:O',y='value', color='variable')
Line charts showing mentions of income and interest rate over time from 1970 to 2015.

Interesting! The mention of interest rate in the UN general speeches spiked in 1980. It dropped after 1984 and has returned to what it was before 1980. Now, let’s take a look at how often climate change and global warming are mentioned.

year_summ['gw'] = year_summ.text.str.count('global warming')
year_summ['cc'] = year_summ.text.str.count('climate change')
year_summ
year text gw cc
0 1970 126.\t In this anniversary year the General As... 0 0
1 1971 83.\t Mr. President, the first words of my del... 0 0
2 1972 Since the twenty-sixth session of the General ... 0 0
3 1973 1.\tIt is a great pleasure for me to congratu... 0 1
4 1974 Mr. President, first I should like to extend m... 0 0
5 1975 104.\t Mr. President, on behalf of the delegat... 0 0
6 1976 Allow me first to say how pleased I am to see ... 0 0
7 1977  \n1.\t'O praise the Lord, all ye nations: pr... 0 0
8 1978 210.\tI am particularly happy to be able in m... 0 0
9 1979 My delegation is pleased to convey to the rep... 0 0
10 1980 I should like first of all to extend to Ambas... 0 0
11 1981 \n73.\t Mr. President, the Republic of Iraq an... 0 0
12 1982 First of all I wish to convey my warm \ncongra... 0 0
13 1983 1.\t It is my pleasure to address, in the nam... 0 0
14 1984 I have the honour to convey to the President ... 0 1
15 1985 I wish to convey to you, Sir, the felicitation... 0 0
16 1986 Allow me first, Sir, to congratulate you on y... 0 0
17 1987 \nAllow me at the outset. Sic, to convey to y... 0 0
18 1988 \nI ask the President to accept our congratul... 1 1
19 1989 It is indeed a pleasure for me and the member... 20 18
20 1990 Mr. President, allow me to congratulate you o... 9 12
21 1991 On behalf of my delegation and on my own beha... 20 30
22 1992 I shall read out the following statement\non b... 6 15
23 1993 Allow me to congratulate you sincerely, Sir,\n... 5 14
24 1994 On behalf of the Namibian\ndelegation, I wish ... 2 9
25 1995 Allow me at the outset, on behalf of the\ndele... 8 12
26 1996 The delegation of the Republic of the Congo\n... 4 16
27 1997 I wish to congratulate the President on his\n... 5 14
28 1998 The General Assembly has\nunanimously chosen M... 10 23
29 1999 Today, we look ahead to the\nnew millennium. A... 4 31
30 2000 I join my colleagues in\ncongratulating the Pr... 7 15
31 2001 On\nbehalf of the Comorian delegation, which ... 4 30
32 2002 Allow me\nto begin my statement by expressing... 6 25
33 2003 The people of Tuvalu,\non whose behalf I have... 4 25
34 2004 The United Nations\nfaces unprecedented challe... 9 42
35 2005 Sixty years ago at San Francisco, the United\n... 1 46
36 2006 In 2006, several important anniversaries coinc... 15 54
37 2007 It is a pleasure, Sir, to congratulate you on... 59 472
38 2008 It is an \nhonour for me to represent my count... 34 353
39 2009 I begin by joining others \nin congratulating ... 47 485
40 2010 It is a privilege and a \ngreat honour for me ... 28 368
41 2011 \nAllow me, first of all, to warmly congratula... 17 287
42 2012 First, I would like\nto express my sincere ap... 8 185
43 2013 Allow me at the outset, on \nbehalf of the Pre... 20 200
44 2014 I congratulate Mr. Sam \nKutesa on his assumpt... 16 307
45 2015 The Head of State of the Transition, Her Excel... 37 382
alt.Chart(year_summ[['year', 'gw', 'cc']]).mark_line().encode(
    x='year',y='gw')
"Line graph showing mentions of global warming over time."
alt.Chart(year_summ[['year', 'gw', 'cc']].melt(
    id_vars='year', value_vars=['cc','gw'])).mark_line().encode(
    x='year:O',y='value', color='variable')
"Line charts showing mentions of global warming and climate change and over time from 1970 to 2015."

Fascinating! Until the late 80’s, neither global warming or climate change were mentioned with relatively close to the same frequency until 2006 when climate change became a huge topic. This raises all kinds of interesting questions. Which countries were talking about these topics and when? This is exactly the kind of thing that happens in data science. One question or the visualization of one or more items often leads to further and even more interesting questions.

year_summ['pollution'] = year_summ.text.str.count('pollution')
year_summ['terror'] = year_summ.text.str.count('terror')
alt.Chart(year_summ[['year','terror']]).mark_line().encode(
    x='year:O', y='terror')
"Line charts showing mentions of terror over time from 1970 to 2015. "
import numpy as np
nrows, ncols = 100000, 100
rng = np.random.RandomState(43)
df1, df2, df3, df4 = (pd.DataFrame(rng.rand(nrows,ncols)) for i in range(4))
%timeit df1 + df2 + df3 + df4
84.9 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit pd.eval('df1 + df2 + df3 + df4')
38.7 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
undf['text_len'] = undf.text.map(lambda x : len(x.split()))
undf.head()
session year country text text_len
0 44 1989 MDV It is indeed a pleasure for me and the member... 3011
1 44 1989 FIN \nMay I begin by congratulating you. Sir, on ... 2727
2 44 1989 NER \nMr. President, it is a particular pleasure ... 4860
3 44 1989 URY \nDuring the debate at the fortieth session o... 2711
4 44 1989 ZWE I should like at the outset to express my del... 4551
undf.groupby('country', as_index=False)['text_len'].mean().head()
country text_len
0 AFG 3014.444444
1 AGO 2645.315789
2 ALB 3482.369565
3 AND 2153.045455
4 ARE 2313.500000
alt.Chart(undf.groupby(
    'country', as_index=False)['text_len'].mean()).mark_bar().encode(
    alt.X('text_len', bin=True), y='count()')
"Bar chart showing the number of records as the y axis and text(binned) as the x axis."
undf.groupby('country', as_index=False)['text_len'].mean().sort_values('text_len').head()
country text_len
25 BRN 1146.870968
186 UZB 1484.700000
176 TON 1496.466667
141 PLW 1517.944444
103 LIE 1538.115385
undf.groupby('country', as_index=False)['text_len'].mean().sort_values(
    'text_len').tail()
country text_len
53 EGY 3981.590909
101 LBY 4074.477273
42 CUB 4100.217391
81 IRL 4284.466667
150 RUS 4400.666667

Lesson Feedback

    During this lesson I was primarily in my...
  • 1. Comfort Zone
  • 2. Learning Zone
  • 3. Panic Zone
    Completing this lesson took...
  • 1. Very little time
  • 2. A reasonable amount of time
  • 3. More time than is reasonable
    Based on my own interests and needs, the things taught in this lesson...
  • 1. Don't seem worth learning
  • 2. May be worth learning
  • 3. Are definitely worth learning
    For me to master the things taught in this lesson feels...
  • 1. Definitely within reach
  • 2. Within reach if I try my hardest
  • 3. Out of reach no matter how hard I try
You have attempted of activities on this page