Extract token knowledge from text.
!python -m spacy download en_core_web_sm
2021-07-19 14:40:18.859194: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Collecting en-core-web-sm==3.1.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.1.0/en_core_web_sm-3.1.0-py3-none-any.whl (13.6MB)
     |████████████████████████████████| 13.6MB 221kB/s 
Requirement already satisfied: spacy<3.2.0,>=3.1.0 in /usr/local/lib/python3.7/dist-packages (from en-core-web-sm==3.1.0) (3.1.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (1.8.2)
Requirement already satisfied: catalogue<2.1.0,>=2.0.4 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.0.4)
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (1.19.5)
Requirement already satisfied: wasabi<1.1.0,>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (0.8.2)
Requirement already satisfied: typer<0.4.0,>=0.3.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (0.3.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.11.3)
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (1.0.5)
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (3.0.5)
Requirement already satisfied: pathy>=0.3.5 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (0.6.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (57.2.0)
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (4.41.1)
Requirement already satisfied: srsly<3.0.0,>=2.4.1 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.4.1)
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.23.0)
Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.7 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (3.0.8)
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.0.5)
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (0.4.1)
Requirement already satisfied: thinc<8.1.0,>=8.0.7 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (8.0.8)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (21.0)
Requirement already satisfied: typing-extensions<4.0.0.0,>=3.7.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (3.7.4.3)
Requirement already satisfied: zipp>=0.5; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from catalogue<2.1.0,>=2.0.4->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (3.5.0)
Requirement already satisfied: click<7.2.0,>=7.1.1 in /usr/local/lib/python3.7/dist-packages (from typer<0.4.0,>=0.3.0->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (7.1.2)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from jinja2->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.0.1)
Requirement already satisfied: smart-open<6.0.0,>=5.0.0 in /usr/local/lib/python3.7/dist-packages (from pathy>=0.3.5->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (5.1.0)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2021.5.30)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->spacy<3.2.0,>=3.1.0->en-core-web-sm==3.1.0) (2.4.7)
Installing collected packages: en-core-web-sm
  Found existing installation: en-core-web-sm 2.2.5
    Uninstalling en-core-web-sm-2.2.5:
      Successfully uninstalled en-core-web-sm-2.2.5
Successfully installed en-core-web-sm-3.1.0
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')

class TokenKnowledgeExtractor[source]

TokenKnowledgeExtractor(spacy_model='en_core_web_sm')

Extract knowledge like token, lemma,pos, tags, noun chunks etc.

TokenKnowledgeExtractor is used to extract token based information or knowledge i.e lemma, pos, tags, noun_chunks etc. We have used spacy package to extract information.

token_ke = TokenKnowledgeExtractor(nlp)
input_text = """
Days of riots and looting in South Africa have left more than 70 people dead, hurt thousands of businesses and damaged major infrastructure in some of the worst civil unrest since the end of white minority rule in 1994.
The unrest started after former President Jacob Zuma handed himself over last week to start a 15-month prison sentence for contempt of court.
JOHANNESBURG, July 14 (Reuters) - Days of riots and looting in South Africa have left more than 70 people dead, hurt thousands of businesses and damaged major infrastructure in some of the worst civil unrest since the end of white minority rule in 1994.

What is driving the violence?

ZUMA'S JAILING

The unrest started after former President Jacob Zuma handed himself over last week to start a 15-month prison sentence for contempt of court.

Zuma supporters, who believe he is the victim of a political witch-hunt, burned tyres and blocked roads in his home province of KwaZulu-Natal.

Support for Zuma stems partly from his image as a man of the people during his nine years in power until 2018, and because some see his jailing as an attack on the nation's largest ethnic group, the Zulu.

Although many wealthy and middle-class South Africans were overjoyed when Zuma was ousted after multiple sleaze and graft allegations, he still retains loyal followings in KwaZulu-Natal and some poor, rural areas.
"""
extracted_info = token_ke.extract(input_text)
extracted_info['sents'][2]
{'noun_chunks': [{'chunk': 'The unrest',
   'root_dep': 'nsubj',
   'root_head': 'started',
   'root_text': 'unrest'},
  {'chunk': 'former President Jacob Zuma',
   'root_dep': 'nsubj',
   'root_head': 'handed',
   'root_text': 'Zuma'},
  {'chunk': 'himself',
   'root_dep': 'dobj',
   'root_head': 'handed',
   'root_text': 'himself'},
  {'chunk': 'last week',
   'root_dep': 'pobj',
   'root_head': 'over',
   'root_text': 'week'},
  {'chunk': 'a 15-month prison sentence',
   'root_dep': 'dobj',
   'root_head': 'start',
   'root_text': 'sentence'},
  {'chunk': 'contempt',
   'root_dep': 'pobj',
   'root_head': 'for',
   'root_text': 'contempt'},
  {'chunk': 'court',
   'root_dep': 'pobj',
   'root_head': 'of',
   'root_text': 'court'}],
 'text': 'The unrest started after former President Jacob Zuma handed himself over last week to start a 15-month prison sentence for contempt of court.',
 'tokens': [{'is_alpha': True,
   'is_stop': True,
   'lemma': 'the',
   'pos': 'DET',
   'tag': 'DT',
   'token': 'The'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'unrest',
   'pos': 'NOUN',
   'tag': 'NN',
   'token': 'unrest'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'start',
   'pos': 'VERB',
   'tag': 'VBD',
   'token': 'started'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'after',
   'pos': 'ADP',
   'tag': 'IN',
   'token': 'after'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'former',
   'pos': 'ADJ',
   'tag': 'JJ',
   'token': 'former'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'President',
   'pos': 'PROPN',
   'tag': 'NNP',
   'token': 'President'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'Jacob',
   'pos': 'PROPN',
   'tag': 'NNP',
   'token': 'Jacob'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'Zuma',
   'pos': 'PROPN',
   'tag': 'NNP',
   'token': 'Zuma'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'hand',
   'pos': 'VERB',
   'tag': 'VBD',
   'token': 'handed'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'himself',
   'pos': 'PRON',
   'tag': 'PRP',
   'token': 'himself'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'over',
   'pos': 'ADP',
   'tag': 'IN',
   'token': 'over'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'last',
   'pos': 'ADJ',
   'tag': 'JJ',
   'token': 'last'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'week',
   'pos': 'NOUN',
   'tag': 'NN',
   'token': 'week'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'to',
   'pos': 'PART',
   'tag': 'TO',
   'token': 'to'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'start',
   'pos': 'VERB',
   'tag': 'VB',
   'token': 'start'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'a',
   'pos': 'DET',
   'tag': 'DT',
   'token': 'a'},
  {'is_alpha': False,
   'is_stop': False,
   'lemma': '15',
   'pos': 'NUM',
   'tag': 'CD',
   'token': '15'},
  {'is_alpha': False,
   'is_stop': False,
   'lemma': '-',
   'pos': 'PUNCT',
   'tag': 'HYPH',
   'token': '-'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'month',
   'pos': 'NOUN',
   'tag': 'NN',
   'token': 'month'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'prison',
   'pos': 'NOUN',
   'tag': 'NN',
   'token': 'prison'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'sentence',
   'pos': 'NOUN',
   'tag': 'NN',
   'token': 'sentence'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'for',
   'pos': 'ADP',
   'tag': 'IN',
   'token': 'for'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'contempt',
   'pos': 'NOUN',
   'tag': 'NN',
   'token': 'contempt'},
  {'is_alpha': True,
   'is_stop': True,
   'lemma': 'of',
   'pos': 'ADP',
   'tag': 'IN',
   'token': 'of'},
  {'is_alpha': True,
   'is_stop': False,
   'lemma': 'court',
   'pos': 'NOUN',
   'tag': 'NN',
   'token': 'court'},
  {'is_alpha': False,
   'is_stop': False,
   'lemma': '.',
   'pos': 'PUNCT',
   'tag': '.',
   'token': '.'}]}