A Simple Programmable Logic Device (SPLD) is an IC that can be configured to perform arbitrary logic functions. An SPLD is very similar to a Complex PLD (CPLD), but an SPLD will have less IO pins and programmable elements, consume less power, and will often require a special programming device to configure, but programming methods are proprietary and may vary by manufacturer. SPLDs are non-volatile and retain their state after power is removed. It includes several configurable logic gates and programmable logic points, and can also include memory and flip-flops.
Among various voltage and power saving options, Atmel SPLD products include 16V8 and 22V10 that meet industry standards. Atmel offers low-voltage, zero-power and 1/4-power versions of products as well as our exclusive "L" automatic power-down low-power devices, including the battery-friendly ATF22LV10CQZ. In addition to Atmel's proprietary TSSOP package-the smallest package designed for SPLD devices, it also supports all popular packaging forms. All versions are based on EE, with high reliability and easy to repeat programming, all popular third-party programmers support these versions.
The definition of diversity in SPLD is that the selected samples are not similar or do not come from the same cluster. It is assumed that the correlation of the samples is related to the cluster, and the similarity between samples within a group belonging to the same cluster. The similarity is higher than the similarity of samples between clusters (samples between groups). Therefore, the diversity between samples can be expressed as: the frames of the same video in the object recognition are considered to belong to the same cluster, and the frames of different videos have diversity; the clustering of the sample space has diversity among clusters. (Introducing the concept of clusters)
Assuming that the training set X=(x1,…,xn)∈Rm×n can be divided or clustered into b clusters: X(1),…,X(b) where X(j)∈Rm×nj is expressed as The set of samples belonging to the jth cluster, nj represents the number of samples of the jth cluster, and sumbj=1nj=n. Therefore, the parameter vector v of self-step learning is changed to matrix V=[v(1),...,v(b)] in SPLD, where v(j)=(v(j)1,...,v(j) nj)∈[0,1]nj. The key to SPLD is that on the one hand, it can be similar to traditional SPL by assigning non-zero weights to "simple" samples; on the other hand, SPLD tends to distribute all non-zero elements to more clusters, that is, to as many as possible In the column vector v(j) of matrix V, to ensure the diversity of samples.
The objective function of SPLD is defined as:
(wt+1,Vt+1)=argmin(r(w)+∑i=1nvif(xi,yi,w)−λ∑i=1nvi−γ||V||2,1),stv∈[0 ,1]n,
Among them, λ, γ are the parameters of the negative L1 norm (representing the difficulty of the sample) and the negative L2,1 norm (representing the diversity of the sample). The negative L2,1 norm can be recorded as:
−||V||2,1=−∑j=1b||v(j)||2.
SPLD introduces a new self-step learning regular term: negative L2, 1L2, 1 norm, which tends to choose diverse samples from multiple clusters. In applied statistics, the L2, 1L2, 1 norm of the parameter matrix VV will lead to the group-wise sparse of VV, for example: non-zero elements tend to gather in a small number of clusters or columns of a few parameter matrix . Conversely, negative L2,1L2,1 norms will have the opposite effect of group sparsity, that is, non-zero elements tend to be distributed in more columns. In other words, this anti-group sparsity can be understood as the diversity of samples.
SPLD still uses the ACS (Alternative Convex Search) method to optimize the entire objective function. While a set of parameters w is fixed, another set of parameters v is updated to achieve alternate optimization of parameter sets. But the difference from traditional SPL is that the self-step learning regular term (negative L2,1 norm) introduced by SPLD is non-convex, while the L1 norm used in SPL is a convex function. Therefore, the commonly used gradient descent or sub-gradient descent algorithms cannot be directly used to optimize the parameter matrix V. Another highlight of SPLD is to propose a simple but effective method to optimize the non-convex problem, while ensuring that the global closed-form optimum of the parameter matrix is obtained:
For the samples of v(j) in each cluster, sort according to the sample loss f(xi,yi,w) from small to large i=(1,…,nj) If the sample loss f(⋅)≤λ+γi√+i− When 1√, let v(j)i=1. As the ranking i increases, the threshold γi√+i−1√ that determines whether a sample is selected is getting lower and lower, and the samples that are ranked behind are those with larger (but not very large) losses. Step learning selection. However, traditional SPL does not have this property. The threshold for SPL to determine sample selection is only determined by λ, which means that when all samples are sorted by loss, the top samples are likely to come from one of the SPLDs or In a small number of clusters, the diversity of samples is ignored; while SPLD can realize that samples ranked at a later position are included in the training process, thus avoiding only selecting samples from one or a few clusters.
This content mainly elaborates how to optimize the parameter vv in SPLD to solve the non-convex optimization problem of L2,1 norm.
The objective function of SPLD optimization is:
minv∈[0,1]nE(w,V;λ,γ)=∑i=1nviL(yi,f(xi,w))−λ∑i=1nvi−γ||V||2,1
Among them, L(yi,f(xi,w)) represents the loss or negative likelihood function value of the sample xixi, and the l2,1 norm ∥V∥2,1 represents Group Sparsity. For the convenience of subsequent writing, we denote the objective function to be optimized E(w,V;λ,γ) and the sample loss L(yi,f(xi,w)) as E(v) and Li.
Because we believe that the data is heterogeneous and can be divided or clustered into b clusters, the sum symbol can be rewritten from ∑ni=1 to ∑bj=1∑nji=1, that is
E(v)=∑bj=1E(v(j)),
Among them, E(v(j))=∑nji=1v(j)iL(j)i−λ∑nji=1v(j)i−γ∥v(j)∥2, L(j)i represents the jth The loss value of the i-th sample in a cluster. Obviously, the original objective optimization problem can be decomposed into a series of multiple sub-optimization problems:
v(j)⋆=argminv(j)∈[0,1]njE(v(j)).
For any sub-problem, because the self-learning parameter vv plays a role in selecting samples, its value is 0 or 1, we assume that the number of elements in the vector v(j) not equal to 0 is k, that is, the rank of the vector It is k, which can be recorded as ∥v(j)∥0=k, which can be used as a restriction condition of each sub-optimization problem to avoid selecting all samples. Therefore, we define the optimization of any sub-problem can be rewritten as
v(j)⋆=argminv(j)∈[0,1]nj∥v(j)∥0=kE(v(j))=argminv(j)(k)E(v(j)(k))
The above formula represents that the feasible solution of v(j)⋆ is one of v(j)(1),...,v(j)(nj), which can ensure that the optimized sub-problem obtains a minimum value.
Without loss of generality, first of all, for the sample set (x(j)1,x(j)2,...,x(j)nj) in the jth suboptimization problem, the corresponding loss values are arranged in ascending order from small to large, and The value of k elements in v(j)(1),...,v(j)(nj) is not equal to 0. Correspondingly, the objective function to be optimized can be written as:
E(v(j)(k))=∑i=1njv(j)iL(j)i−λ∑i=1njv(j)i−γ∥v(j)∥2=∑i=1njv(j) iL(j)i−λk−γk−−√
Then, for any two adjacent elements in the sequence E(v(j)(1)),...,E(v(j)(nj)), calculate the difference diffk between the two:
diffk=E(v(j)(k+1)−E(v(j)(k)=L(j)k+1−λ−γ(k+1−−−−√−k−−√) =L(j)k+1−(λ+γ1k+1−−−−√+k−−√)
For the difference calculation formula, L(j)k represents the loss of the sample, because at the beginning we arranged the loss of all samples in the j-th sub-optimization problem in ascending order, so as kk increases (k=1,...,nj) , L(j)k is a monotonically increasing sequence, and the corresponding (λ+γ1k+1√+k√) is a monotonically decreasing sequence, and then we can find that the difference sequence diffk is a monotonically increasing sequence.
Finally, for the difference sequence, when diffk<0, the objective function to be optimized E(v(j)(k)) shows a decreasing trend, when diffk>0, the objective function E(v(j)(k)) shows Increasing trend. Obviously, when diffk=0, the objective function achieves a minimum value. Therefore, the objective function corresponding to the solution of v(j)⋆ is
diffk=0⇔L(j)k+1=λ+γ1k+1−−−√+k−−√⇔√⇔L(j)k=λ+γ1k−−√+k−1−−−√
To sum up, we can prove that for the samples in each cluster, the sample loss L(j)i is sorted from small to large i=(1,...,nj), if the sample loss L(j)i(⋅)≤ When λ+γi√+i−1√, let v(j)i=1, otherwise let v(j)i=0.
FPGA Virtex-5 FXT Family 65nm Technology 1V 665-Pin FCBGA
FPGA Virtex-5 FXT Family 65nm Technology 1V 665-Pin FCBGA
FPGA Virtex-5 FXT Family 65nm Technology 1V 665-Pin FCBGA
FPGA XC3000 Family 4.5K Gates 224 Cells 113MHz 5V 132-Pin CPGA
FPGA Virtex-E Family 63.504K Gates 5292 Cells 357MHz 0.18um Technology 1.8V 256-Pin FBGA
Support