Home > Wiki encyclopedia > one-hot

# one-hot

One-bit valid coding logic refers to the logic that one bit is valid (or high level) and all other bits are invalid (or low level).

## Catalogues

### Introduction

One-Hot encoding, also known as one-bit effective encoding, mainly uses N-bit status registers to encode N states. Each state has its own independent register bit, and only one bit is valid at any time.

One-Hot coding is the representation of categorical variables as binary vectors. This first requires mapping classification values to integer values. Then, each integer value is represented as a binary vector, except for the index of the integer, it is all zero, and it is marked as 1.

### Detailed one-hot encoding process

For example, if we want to perform one-hot encoding for "hello world", how do we do it?

1. Determine the object to be encoded-hello world,

2. Determine the categorical variable-h e l l o space w o r l d, a total of 27 categories (26 lowercase letters + space,);

3. The above question is equivalent to that there are 11 samples, each sample has 27 features, which is converted into a binary vector representation,

There is a premise here, the order of the feature arrangement is different, the corresponding binary vector is also different (for example, I put spaces in the first column and a in the first column, the one-hot encoding results are definitely different)

Therefore, we must agree in advance the order of feature arrangement:

1. The 27 features are first integer coded: a--0, b--1, c--2,..., z--25, space--26

2. The 27 features are arranged from front to back according to the size of the integer code

Another example: we want to perform one-hot encoding for ["China", "United States", "Japan"],

How to do it

1. Determine the object to be coded--["China", "United States", "Japan", "United States"],

2. Determine categorical variables-China, the United States and Japan, a total of 3 categories;

3. The above problem is equivalent to that there are 3 samples, and each sample has 3 features. Convert it to a binary vector representation.

We first perform the integer coding of the features: China-0, the United States-1, Japan-2, and arrange the features from smallest to largest

The one-hot encoding is as follows:

["China", "United States", "Japan", "United States"] ---> [[1,0,0], [0,1,0], [0,0,1], [0,1 ,0]]

### Why do you need one-hot encoding?

One hot coding is the process of converting categorical variables into a form that is easy to use by machine learning algorithms.

The above hello world is equivalent to a multi-category problem (27 categories), each sample corresponds to only one category (that is, the value is only 1 in the corresponding feature, and the value is 0 in the rest), and our classification result is It is often the probability of belonging to a certain category, which becomes very convenient when calculating loss functions (such as cross entropy loss) or accuracy.

### Defects of one-hot encoding

One-hot encoding requires that each category be independent of each other. If there is a continuous relationship between them, it may be more appropriate to use distributed respresentation.

## FPGA Families  Need Help?

Support