spaCY is a python library for natural language processing. It comes with pre-trained statistical models that allow it to perform detailed semantic analysis for English, German, Greek, Spanish, French, Italian, Dutch, and Portuguese. In these languages, spaCy is able to break sentences into parts of speech, identify syntactic relationships (subject, object, etc), and generate sentence diagrams. In addition, it is able to identify root words (a process called Lemmatization by linguists) for the above languages, recognizing both suffixed roots (e.g., "color," "colors," "coloring") and different verb tenses (e.g., "is," "was," "be"). spaCy can also perform Named Entity Recognition (NER), to recognize "named 'real-world' objects like persons, companies, or locations." For around 40 other languages that lack a statistical model, spaCy is still able to accomplish simpler tasks like text tokenization and similarity testing. On spaCy's Usage page, readers can find installation instructions for windows, macOS, and Unix/Linux systems. The Usage page also provides a number of guides for getting started and a series of in-depth code examples. spaCy is free software distributed under the MIT license with source code available on GitHub.
Comments