principal component analysis(PCA) āĻŦāĻšā§āϞ āĻŦā§āϝāĻŦāĻšāĻŋāϤ Dimensionality Reduction Algorithm. PCA āĻŽā§āϞāϤ āĻāĻāĻāĻŋ āĻĄāĻžāĻāĻžāϏā§āĻā§ āĻĄāĻžāĻāĻžāĻā§āϞā§āϰ Orthogonal Projection ( āϞāĻŽā§āĻŦ āĻ āĻāĻŋāĻā§āώā§āĻĒ) āĻā§āĻā§ āĻŦā§āϰ āĻāϰā§āĨ¤ Orthogonal projection āĻāϰ āĻŽāĻžāϧā§āϝāĻŽā§ PCA āĻĄāĻžāĻāĻžāϏā§āĻā§āϰ āϏāϰāĻŦāĻā§āĻ Variance āĻā§āĻā§ āĻŦā§āϰ āĻāϰā§, āĻāĻžāϰ āϏāĻžāĻšāĻžāϝā§āϝ āĻĄāĻžāĻāĻžāϏā§āĻ āĻāĻŦāĻ Feature āĻŽāϧā§āϝ⧠linear-corelation āĻŦā§āϰ āĻāϰāĻž āĻāĻžāĻ!āĨ¤
āĻāϰāĻĨāĻžāϤ, āĻāĻŽāĻžāĻĻā§āϰ āĻāĻžāĻā§ āĻāĻĻāĻŋ āĻāĻāĻāĻŋ āύāĻŋāĻĻāĻŋāĻļāϤ āĻĻāĻžāϤāĻžāϏāĻāϤā§āϰ linearlg corelated āĻāĻŋāĻā§ āĻĢāĻŋāĻāĻžāϰ āĻĨāĻžāĻā§ āϤāĻžāĻāϞ⧠PCA āĻāĻāĻāĻž suitability orthogonal direction āĻā§āĻā§ āĻŦā§āϰ āĻāϰāϤ⧠āĻĒāĻžāϰāĻŦā§ āĻāĻž āĻāĻŽāĻĻā§āĻ āĻĻāĻžāϤāĻžāϏā§āϰ āĻāϰ āϏāĻŽāϏāϏā§āϤ āĻĻāĻžāϤāĻž āĻā§ āĻāĻā§āϤāĻž direction āĻ āϤā§āϞ⧠āϧāϰāϤ⧠āĻĒāĻžāϰāĻŦā§āĨ¤
PCA āĻāϤāĻā§āϞāĻŋ principal component āύāĻŋā§ā§ āĻāĻ āĻŋāϤ, āĻāϞā§āύ Principal component āĻāĻŋ āĻĻā§āĻā§ āύā§ā§
Principal component : Principal component āĻšāϞ⧠Initial Variable ( Raw dataset) āĻĨā§āĻā§ āϏā§āϰāϏā§āϤ Linear combination or mixure āĻāϰ āĻŽāĻžāϧā§āϝāĻŽā§ āĻāĻāĻāĻŋ New Variable( New Dataset).
New Variable āĻāĻŋ āĻāϤāĻā§āϞ Principal component āύāĻŋā§ā§ āĻāϤāĻŋāϤāĨ¤ Principal component āĻāĻ āĻŦāĻž āĻāĻāĻžāϧāĻŋāĻ āĻšāϤ⧠āĻĒāĻžāϰā§āĨ¤ āĻ āϰāĻĨāĻžāϤ āĻāĻāĻāĻŋ āĻĄāĻžāϤāĻžāϏā§āϤā§āϰ āĻĄāĻŋāĻŽā§āύāĻļāύ āĻāĻĻāĻŋ ā§§ā§Ļā§Ļ āĻšāĻ āĻ āϤāĻŋāĻŦā§ āϤāĻžāϰ principal component hobe 100āϤāĻŋāĨ¤ Principal component āĻā§āϞ āĻĻāĻžāϤāĻžāϰ āĻāύāĻĢāϰāĻŽā§āĻļāύ āĻāϰ āĻāĻĒāϰ⧠āĻāĻŋāϤā§āϤāĻŋ āĻāϰ⧠āύāĻŋāĻŽā§āύāĻā§āϤ āĻŦāĻŋāύāĻžāϏ āĻāĻāĻžāϰ⧠āϏāĻžāĻāĻžāύā§āĻž āĻĨāĻžāĻā§ âĻâĻâĻâĻ.
New variable āĻāĻŋ totally uncorrelated āĻšā§ā§ āĻĨāĻžāĻā§ āĻāĻŦāĻ Initail variable āĻāϰ āĻ āϧāĻŋāĻāĻžāĻāĻļ āĻāύāĻĢāϰāĻŽā§āĻļāύ compressed āĻšā§ā§ 1st pricipal component create kore thake.
PCA try āĻāϰ⧠āĻ āϧāĻŋāĻāĻžāĻāĻļ āύāĻŋāϰāĻāϰ āĻāύāĻĢāϰāĻŽā§āĻļāύ 1st principal component āĻ āϰāĻžāĻāĻžāϰ āϤāĻžāϰāĻĒāϰ āĻ āĻŦāĻļāĻŋāώā§āĻ āĻ āϧāĻŋāĻāĻžāĻāĻļ āĻāύāĻĢāϰāĻŽā§āĻļāύ 2nd principal component āĻ āϰāĻžāĻāĻžāϰ āĻāĻŦāĻ āĻāĻāĻāĻžāĻŦā§ āĻāύāĻĢāϰāĻŽā§āĻļāύ āĻāϰ āĻāĻĒāϰ⧠āĻāĻŋāϤā§āϤāĻŋ āĻāϰ⧠principal component āĻāϰ āĻŦāĻŋāύāĻžāϏ create āĻšā§ā§ āĻĨāĻžāĻā§āĨ¤ āύāĻŋāĻŽā§āύ⧠āĻāĻŋāϤā§āϰ⧠āĻĻā§āĻāĻžāύ⧠āĻšāϞā§āĨ¤ picture hereâĻâĻ
āĻāĻĒāϰāĻŋāĻāĻā§āϤ āĻŦāĻŋāύā§āύāĻžāϏ(higher information to lower information) āĻāĻāĻžāϰ⧠principal components āĻā§āϞāĻž āϏāĻžāĻāĻŋā§ā§ āĻā§āĻŦ āϏāĻšāĻā§ āĻāĻŽāϰāĻž āĻāĻŽ āĻāύāĻĢāϰāĻŽā§āĻļāύ āύāώā§āĻ āĻāϰ⧠āĻāĻāĻāĻŋ Lower dimensional Dataset( new Dataset) create hoi.
āϏā§āϤāϰāĻžāĻ, āĻāĻāĻāĻžāĻŦā§ lower information principal component āĻŦāĻžāĻĻ āĻĻāĻŋā§ā§ āĻ āĻŦāĻļāĻŋāϏā§āĻ principal component āύāĻŋā§ā§ initail varaiable(Raw Dataset) āĻĨā§āĻā§ new Variable ( new Dataset) create hoye thake.
Example : āĻāĻŽāϰāĻž āĻāĻžāύāĻŋ āĻāĻāĻāĻž āĻĻāĻžāϤāĻžāϏā§āϤ āĻāϰ dimension jodi 100D hoye tobe tar principal component o hobe 100ti. PCA jokhon dimension reduction kore tokhon low variance feature ke bad diye higher dimension theke lower dimension dataset create kore thake.. Orthat optimal principal component khuje ber korar jonno PCA sob somoi low information feature or low variance data ke noise hisabe bibecona kore. Ei noise feature gula PCA bad diye ekta notun dataset create kore thake jar dimension hoii Main dataset er dimension theke onkk kom ( deoend on infomation gather by each princiapl component).
Dhoren main datasert er name A. A datser er dimension hosce 100D ebong er principal component o 100ti. 100 ti principal component er mjhee 1st 20 principal component e 95% data information hold kore.
PCA tokhon 21-100 porjonto dimension er data ke noise hisabe bibchone kore ogula reomve kore dibe. Baki 20 principle component niye ekta new Dataset create kore B.
sutrang PCA apply kore, B dataset ti A dataset 95% information hold kore, higher dimensional dataset (A: 100 dimension) theke lower dimensional dataset( B : 20 dimension) create korbe.
Note :: Dekha gese noise feature or low variance data gula suoervied