Featured image of post Arduino Nano BLE 33 Sense Microcontroller: Hardware and GPIO Functions

Arduino Nano BLE 33 Sense Microcontroller: Hardware and GPIO Functions

The Arduino Nano BLE 33 Sense Microcontroller is an Arduino-compatible board with a fleet of onboard sensors, including sound, light, temperature, and a microphone. It can be programmed with Arduino C and MicroPython to read and write data. This article introduces this unique board, details how to use the digital and analog pins and which functions the board supports.

Featured image of post Wikipedia Article Crawler & Clustering: Text Classification with Spacy

Wikipedia Article Crawler & Clustering: Text Classification with Spacy

Spacy is a powerful NLP library that performs many NLP tasks in its default configuration, including tokenization, stemming and part-of-speech tagging. These steps can be extended with a text classification task as well, in which training data in the form of preprocessed text and expected categories as dictionary objects are provided. Both multi-label and single-label classification is supported.

Featured image of post NLP with Spacy: Custom Text Classification Pipeline

NLP with Spacy: Custom Text Classification Pipeline

Spacy is a powerful NLP library in which many NLP tasks like tokenization, stemming, part-of-speech tagging and named entity resolution are provided out-of-the box with pretrained models. All of these tasks are wrapped by a pipeline object, and internal abstraction of different functions that are applied step by step on a given text. This pipeline can be both customized and extended with self-written functions.

Featured image of post Wikipedia Article Crawler & Clustering: KMeans

Wikipedia Article Crawler & Clustering: KMeans

Wikipedia is a rich source of information and knowledge. Conveniently structured into articles with categories and links to other articles, it also forms a network of related documents. My NLP project downloads, processes, and applies machine learning algorithms on Wikipedia articles.

Featured image of post NLP: Text Vectorization Methods with SciKit Learn

NLP: Text Vectorization Methods with SciKit Learn

SciKit Learn is an extensive library for machine learning projects, including several classifier and classifications algorithms, methods for training and metrics collection, and for preprocessing input data. In every NLP project, text needs to be vectorized in order to be processed by machine learning algorithms. Vectorization methods are one-hot encoding, counter encoding, frequency encoding, and word vector or word embeddings. Several of these methods are available in SciKit Learn as well.

Featured image of post NLP: Text Vectorization Methods from Scratch

NLP: Text Vectorization Methods from Scratch

NLP projects work with text, but text cannot be used by machine learning algorithms unless transformed into a numerical representation. This representation is typically called a vector, and it can be applied to any reasonable unit of a text: individual tokens, n-grams, sentences, paragraphs, or even whole documents.

Featured image of post NLP Project: Wikipedia Article Crawler & Classification - Corpus Transformation Pipeline

NLP Project: Wikipedia Article Crawler & Classification - Corpus Transformation Pipeline

My NLP project downloads, processes, and applies machine learning algorithms on Wikipedia articles. In my last article, the projects outline was shown, and its foundation established. First, a Wikipedia crawler object that searches articles by their name, extracts title, categories, content, and related pages, and stores the article as plaintext files. Second, a corpus object that processes the complete set of articles, allows convenient access to individual files, and provides global data like the number of individual tokens.