- Izibikezelo zemodeli yezihlahla zezinqumo ngokusebenzisa ukuhlukana okuphindaphindiwe okukhethiwe ukunciphisa ukungcola, kusetshenziswa izindlela ezifana ne-Gini, i-entropy noma i-variance.
- I-Information Gain iqondisa ukukhethwa kwesici kanye nomkhawulo ku-node ngayinye, okuvumela izihlahla ukuthi ziphathe kokubili ukuhlehla kanye nokuhlukaniswa.
- Ama-Hyperparameter anjenge-max_depth, min_samples_split kanye ne-min_information_gain alawula ukufakwa ngokweqile kanye nobunzima besihlahla.
- Ukuqonda indlela esebenza ngayo isihlahla esisodwa kubalulekile ngaphambi kokuthuthela eqoqweni elifana namahlathi angahleliwe aqinisa futhi athuthukise ukusebenza.

Ukuhlehla kwesihlahla sesinqumo kusukela ekuqaleni kungenye yezivivinyo ezivula amehlo kakhulu ongayenza uma ufuna ukuqonda ngempela ukuthi amamodeli asekelwe esihlahleni acabanga kanjani nokuthi kungani ethandwa kangaka ekufundeni komshini. Esikhundleni sokuphatha umuthi njengebhokisi elimnyama eliyimfihlakalo, uzobona ukuthi ukwahlukana ngakunye kukhethwa kanjani, ukuthi ukungcola kulinganiswa kanjani nokuthi izibikezelo zezinombolo zenziwa kanjani emaqabunga, kokubili ngezinkinga zokuhlehlisa kanye nokuhlukanisa.
Kulo mhlahlandlela sizohamba ngemibono eyinhloko ngemuva kwezihlahla zezinqumo, imisebenzi yezindleko abayisebenzisayo, indlela abafuna ngayo ukuhlukaniswa okungcono kakhulu, kanye nendlela yokufaka ikhodi kusihlahla esiyisisekelo esisekela kokubili ukuhlehla kanye nokuhlukaniswa, sisebenzisa imiqondo eyisisekelo kuphela njengemijikelezo, izimo kanye nezibalo ezilula. Endleleni sizoqhathanisa izihlahla zokuhlehla vs. zokuhlukanisa, sixhumanise ithiyori nokusetshenziswa okusebenzayo kumathuluzi afana ne-Python kanye ne-R (isibonelo nge-rpart kanye nesihlahla), bese sibeka izihlahla zesinqumo kafushane ngaphakathi kwamaqoqo amakhulu njengamahlathi angahleliwe.
Uyini umuthi wesinqumo futhi kungani ulula kangaka?
Umuthi wesinqumo empeleni uwukugeleza kwemibuzo ethi yebo/cha (noma imithetho elula) ekuqondisa kusukela esinqumweni esiyinhloko kuze kube sekubikezelweni kokugcina endaweni yeqabunga. Esimweni sokufunda esijwayelekile esiqondisiwe, umgomo uwukubikezela i-target variable Y kusetshenziswa izibikezelo eziningi (izici, ama-covariates), futhi umuthi ufunda uchungechunge lwemibuzo efana nokuthi “ingabe isisindo singu-≤ 103?” noma “ingabe izwe lise-{US, UK, CA}?” okuhlukanisa idatha kancane kancane ibe amaqembu afanayo.
Ukuze uthole umuzwa othile, ake sithi ufuna ukubikezela ukuthi umuntu ukhuluphele yini usebenzisa ubude nesisindo kuphela, futhi unesethi yedatha enelebula ekutshela ukuthi ubani okhuluphele kakhulu nokuthi ubani ongakhuluphele. Isihlahla singathola umthetho onjengokuthi “uma isisindo singaphezu kuka-100 kg, bikezela ukukhuluphala”, kodwa lowo mthetho ngeke uphelele: abanye abantu abangaphezu kuka-100 kg ngeke babe nokukhuluphala ngokweqile, kanti abanye abangaphansi kwalowo mkhawulo bazoba nokukhuluphala ngokweqile. Isihlahla sibe sesiqhubeka nokwengeza eminye imibuzo (ukuhlukanisa okuncane), isibonelo ngokuphakama noma umkhawulo wesisindo ocwengekile, ukuze “kulungiswe” lezo zibikezelo zokuqala ezingacacile.
I-node ngayinye yangaphakathi esihlahleni ihambelana nomthetho wesinqumo, igatsha ngalinye lihambisana nomphumela owodwa walowo mthetho, futhi i-node ngayinye yeqabunga ihambelana nesifunda sesikhala sesici lapho izibikezelo zihlala njalo. Ekuhlukaniseni, iqabunga libuyisela ilebula lekilasi (noma ukusatshalaliswa kwamathuba phezu kwamalebula); ekubuyiseleni emuva, iqabunga livame ukubuyisela isilinganiso samanani aqondiwe awela kuleso sifunda.
Enye yamandla amakhulu ezihlahla zezinqumo ukuthi zisingatha kokubili ukuhlehla kanye nokuhlela ngokwemvelo, kulula ukuzichaza, futhi zisebenza ngezibikezeli zenani kanye nekhwalithi (zezigaba) ngaphandle kokudinga ukucubungula okunzima. Akudingeki ukuthi ucabangele noma yikuphi ukusatshalaliswa okuqondile kwezici zakho noma okuqondiwe, okwenza izihlahla zikhange kakhulu ezimweni zangempela lapho ukuqagela okuqondile kwakudala kuvame ukwephulwa khona.
Izihlahla zokuhlukaniswa vs. zokuhlehla
Nakuba isakhiwo sezinhlobo kanye nezihlahla zokuhlehlisa sifana, uhlobo lwe-response variable Y kanye nomsebenzi wezindleko osetshenziselwa ukuhlukanisa kuyahluka phakathi kwalezi zinhlobo ezimbili. Uma u-Y ewuhlobo oluningi (isibonelo, ukuthengisa, isikhathi sokuphila, ukusetshenziswa kukaphethiloli), sikhuluma ngomuthi wokuhlehlisa; uma u-Y ewuhlobo noma ewuhlobo olujwayelekile (isibonelo, osindile vs. ongasindanga, okhuluphele vs. ongakhuluphele), sikhuluma ngomuthi wokuhlukanisa.
Esihlahleni sokubuyisela emuva, inhloso evamile ukuhlukanisa isikhala sesici sibe yizifunda lapho impendulo ingalinganiswa khona ngokungaguquki, ngokuvamile isilinganiso sokubonwa kuleso sifunda. Imithetho ejwayelekile yesinqumo inesimo esithi “ngu-xk ≤ c?”, kuphi xk ingenye yama-covariate kanti u-c uwumkhawulo; le mithetho ihlukanisa isikhala ngokuphindaphindiwe sibe ama-hyper-rectangles, futhi wonke amaphuzu ku-hyper-rectangle efanayo abelana ngenani elifanayo elibikezelwe ŷ.
Esihlahleni sokuhlukanisa, ukuhlukaniswa kusese “umkhawulo wesici ≤?” noma “isigaba kusethi S?”, kodwa ikhwalithi yokuhlukaniswa ilinganiswa ngokuthi ama-child node aphumayo ahlanzekile kangakanani ngokwemalebula ekilasi. Ukubikezela kwamaqabunga ngokuvamile kuyisigaba esiningi ngaphakathi kwalelo node, futhi imodeli izama ukudala amaqabunga asondele ngangokunokwenzeka ekuqukatheni isigaba esisodwa kuphela.
Naphezu kwalokhu kungafani kohlobo oluqondiwe, ngokombono wokubhala ikhodi ungasebenzisa isakhiwo esisodwa sesihlahla esijwayelekile bese umane uxhume izindlela ezahlukene zokungcola noma zokulahlekelwa kuye ngokuthi wenza i-regression noma i-classification. Kamuva, lapho sibala i-Information Gain, uzobona ukuthi amafomula okuhlukaniswa (okusekelwe ku-entropy) kanye ne-regression (okusekelwe ku-variance) ahambisana nomoya.
Imisebenzi yokungcola kanye nezindleko ezihlahleni zezinqumo
Enhliziyweni yanoma iyiphi i-algorithm yesihlahla sesinqumo kukhona umsebenzi wezindleko ohlola ukuthi ukuhlukaniswa okuthile kuhle kangakanani ekuhlukaniseni idatha ibe amaqembu anencazelo. Lo msebenzi wezindleko uvezwa ngokwemibandela yokungcola: i-node ibhekwa njengehlanzekile uma wonke amasampula ayo engaphansi kwesigaba esifanayo (sokuhlukaniswa) noma enenani elifanayo lezinombolo (lokuhlehlisa).
Noma nini lapho ukhetha umuntu ohlukaniswe ngesici, i-algorithm ibheka ama-node ezingane ayikhiqizayo bese ibuza: "amalebula (noma amanani) axubene kangakanani enganeni ngayinye?" Ukuhlukaniswa okuhle yilokho okukhiqiza ama-node ezingane angangcolanga kangako kunomzali, okusho ukuthi idatha engaphakathi kwengane ngayinye iyafana kakhulu maqondana nomgomo.
Ezihlahleni zokuhlukanisa, ukungcola kuvame ukulinganiswa ngezindinganiso ezifana ne-Gini index noma i-entropy, zombili ezibonisa ukuthi kungenzeka kangakanani ukuthi ukubonwa okukhethwe ngokungahleliwe kuleyo node kungahlukaniswa kabi uma simane sibikezela ikilasi leningi. Ezihlahleni zokuhlehla, ukungcola kuvame ukulinganiswa ngephutha eliyisikwele noma ukuhlukahluka, okubonisa ukuthi amanani aqondiwe asakazeke kangakanani ngaphakathi kwe-node.
Inkomba ye-Gini: ukulinganisa ukungcola ezihlahleni zokuhlukaniswa
Inkomba ye-Gini ingenye yezindlela zokulinganisa ukungcola ezisetshenziswa kakhulu ekuhlukaniseni izihlahla ngoba kulula ukuyibala futhi isebenza kahle ekusebenzeni. Ngomqondo, ilinganisa amathuba okuthi ukubonwa okukhethwe ngokungahleliwe kusuka ku-node kungahlukaniswa ngendlela engafanele uma ilebula layo libikezelwe ngokuya ngokusatshalaliswa kwelebula kuleyo node.
Uma i-node iqukethe amakilasi anama-potential P1, P2, …, Pn, inkomba ye-Gini ibalwa njenge-Gini = 1 − Σ (Pi)². Uma i-node ihlanzekile ngokuphelele (konke okubonwayo kungokwesigaba esifanayo), enye yamathuba ingu-1 kanti okunye kungu-0, ngakho-ke isamba sezikwele singu-1 kanti inkomba ye-Gini ingu-0, okubonisa ubumsulwa obuphelele.
Ngakolunye uhlangothi, inkomba ye-Gini ifinyelela umkhawulo wayo ophezulu lapho amakilasi exutshwe ngokulinganayo ngaphakathi kwe-node, isibonelo enkingeni ye-binary ne-P1 =P2 = 0.5, okunikeza i-Gini = 1 − (0.5² + 0.5²) = 0.5. Kuleso simo, ukubikezela ikilasi leningi kubi ngangokunokwenzeka ngalokho kusatshalaliswa ngoba i-node iqukethe ingxenye yekilasi ngalinye.
Uma usebenzisa i-Gini kukhodi, uvame ukuthatha ivektha yelebula yenodi, ubale imvamisa yekilasi ngalinye, uguqule amaza abe amathuba, bese usebenzisa ifomula 1 − Σ p². Uma wenza lokhu ngokuhlukana okuningi okungaba khona, ungaqhathanisa ukuthi yikuphi ukuhlukana okukhiqiza izingane ezinokungcola kwe-Gini okulinganiselwe okuphansi, okuyikho kanye lokho isihlahla esikudingayo ukuze sinqume ukuhlukaniswa okungcono kakhulu.
I-Entropy: omunye umbono wokungcola kokuhlukaniswa
I-Entropy iyindlela yokulinganisa ukungcola ehlukile esetshenziswa kabanzi ku-information theory kanye naku-early tree algorithms njenge-ID3 kanye ne-C4.5, futhi ibamba inani lokungahleliwe noma ukungaqiniseki ekusatshalalisweni kwekilasi le-node. Ngenkathi i-Gini igxile ekungahlukaniseni kahle, i-entropy ilinganisa "ukumangala" okuhlobene nokubuka ikilasi elithile lapho ukusatshalaliswa kuxubile.
Amathuba ekilasi anikezwe u-p1, …, pc kwi-node S, i-entropy yayo ichazwa ngokuthi i-E(S) = − Σ pi ilogi₂(pi). Uma i-node ihlanzekile, enye yamathuba ingu-1 kanti wonke amanye angu-0, okwenza isamba sibe ngu-zero (ngoba i-log₂(1) = 0), ngakho-ke i-entropy ingu-0, okubonisa ukungabi nasiqiniseko.
Uma i-node iqukethe ukusatshalaliswa okufanayo kwamakilasi, i-entropy ikhuliswa; ngenkinga ye-binary nge-p1 =p2 = 0.5, i-entropy iyi-bit eyi-1, okuyinani eliphakeme kakhulu lamakilasi amabili. Leli nani lihambisana nokungaqiniseki okuphezulu, okusho ukuthi i-node ingcolile ngangokunokwenzeka ngaphansi kwalolo sabalaliso.
Ngisho noma i-Gini ne-entropy zisebenzisa amafomula ahlukene futhi zinamabanga ahlukene ezinombolo (i-Gini ephakathi kuka-0 no-0.5 kwamakilasi amabili, i-entropy ephakathi kuka-0 no-1), zombili zilinganisa umqondo ofanayo, ngakho-ke zivame ukuholela ezihlahleni ezifanayo kakhulu ekusebenzeni. Uma ubala womabili ku-node efanayo, uzothola ukuthi i-Gini ephezulu ihambelana ne-entropy ephezulu kanye nokuphambene nalokho, yingakho imitapo yolwazi eminingi ikuvumela ukuthi ukhethe noma yikuphi ngaphandle kokushintsha kakhulu ukusebenza.
Ulwazi Lokuthola Nokukhetha Izinhlobo Ezinhle Kakhulu Zokuhlukaniswa
Ukuze kukhethwe ukuhlukaniswa okuhle kakhulu phakathi kwabaningi abazongenela ukhetho, i-algorithm yesihlahla isebenzisa i-metric ebizwa ngokuthi i-Information Gain, ekala ukuthi ukungcola kwehla kangakanani lapho sihlukanisa i-node ibe yizingane zayo. Ngokwemvelo, ukuhlukana kunenzuzo ephezulu yolwazi uma izingane zihlanzekile kakhulu kunomzali, okusho ukuthi umthetho uhlukanise ngempumelelo idatha ibe amaqembu anencazelo ethe xaxa.
Ngezinhlobo zezihlahla ezisebenzisa i-entropy, i-Information Gain yokwahlukana ichazwa njenge-IGukuhlukaniswa = E(umzali) − Σ (|SIngane| / |Sumzali|) · E(SIngane). Uqala ngokubala i-entropy ye-node yomzali, bese ususa i-entropy eyisilinganiso esilinganisiwe yama-node ezingane, lapho izisindo zingosayizi wazo ohlobene.
Kuma-regression trees, umqondo ofanayo usebenzisa i-variance noma i-mean squared error njengendlela yokulinganisa ukungcola, okunikeza i-IGukulawula = Var(umzali) − Σ (|SIngane| / |Sumzali|) · Var(S)Ingane). Kulesi simo, ukuhlukaniswa okuhle yikona okunciphisa kakhulu ukuguquguquka kwamanani aqondiwe ngaphakathi kwengane ngayinye.
I-algorithm yokuqeqeshwa kwesihlahla ihlola le Nzuzo Yolwazi kuyo yonke i-candidate split engenzeka kuzo zonke izici, bese ikhetha i-split ene-profit ephezulu kakhulu, uma nje idlula umkhawulo omncane ukuze igweme ukudala intuthuko encane engenamsebenzi. Le nqubo iphindwa ngokuphindaphindiwe ku-node ngayinye yengane kuze kube yilapho kufinyelelwa imigomo ethile yokuma.
Indlela yokusesha ukuhlukaniswa okuhle kakhulu kwesici ngasinye
Ukuthola ukuhlukaniswa okuhle kakhulu kwesici esisodwa kuncike ekutheni lesi sici sinezinombolo noma singokwesigaba, kodwa umqondo oyisisekelo uhlala ufana: bala ukuhlukaniswa kwabantu abazongenela ukhetho bese ubala i-Information Gain yabo. Ezicini zezinombolo, ukwahlukanisa kuchazwa ngomkhawulo; ezicini zezigaba, kuchazwa ngokuhlanganisa amazinga abe ama-subset.
Ukuze uthole isibikezeli sezinombolo, isu elivamile ukubheka wonke amanani ahlukile afaka isici esithatha i-node yamanje, siwahlele, bese sicabangela imikhawulo yamanani phakathi kwamanani alandelanayo. Ku-threshold ngayinye yokukhetha u-c, udala amaqembu amabili (x ≤ c kanye no-x > c), ubale ukungcola kweqembu ngalinye, bese ubala i-Information Gain; i-threshold eveza inzuzo ephezulu kakhulu ukuhlukaniswa kwakho kwezinombolo okungcono kakhulu kuleso sici.
Uma sibhekene nezibikezeli zezigaba, isikhala sokusesha siyinkimbinkimbi kakhulu ngoba, ngokomthetho, noma yiliphi i-subset yezigaba lingakha uhlangothi olulodwa lokuhlukanisa, kanti i-complement ingakolunye uhlangothi. Esicini esinezigaba ze-K, kunezinhlobo eziningi ezingasetshenziswa (2)K−1 - 1 izingxenye ezingezona ezincane), ngakho-ke ekusebenzeni ngokuvamile kuvame ukuvimbela lokhu kusesha noma ukusebenzisa ama-heuristics, ikakhulukazi uma u-K emkhulu.
Uma usubale ukuhlukaniswa okungcono kakhulu kwesici ngasinye, uqhathanisa izinzuzo zabo zolwazi bese ukhetha isici kanye nomkhawulo (noma i-subset yesigaba) okuhambisana nenzuzo ephezulu. Lokhu kuhlukaniswa okukhethiwe kuba yisinqumo ku-node yamanje, bese inqubo yokuqeqesha iphinda ibuyele enganeni ngayinye ngesethi ehambisanayo yokubuka.
Ukulawula ukukhula kwesihlahla ngama-hyperparameter
Uma uvumela umuthi wokukhetha ukuthi ukhule ngaphandle kwemingcele, uzoqhubeka uqhekeka kuze kube yilapho iqabunga ngalinye lihlanzekile ngokuphelele noma linokubonwa okuncane kakhulu, okuvame ukuholela ekufakweni ngokweqile (overfitting vs underfitting). Ukuze ugweme lokhu, usetha iqoqo lama-hyperparameter alawula ukujula nobunzima besihlahla.
I-hyperparameter evamile yi-max_depth, ehlanganisa inani eliphezulu lamazinga umuthi ongakhula ngawo kusukela empandeni kuya kunoma yiliphi iqabunga. Uma i-max_depth isethwe ku-None (noma inombolo enkulu kakhulu), isihlahla singaqhubeka sikhula uma nje eminye imikhawulo ihlangatshezwa; uma sincane, isihlahla sihlala singajulile futhi siqondakala kalula kodwa singase singafaneleki.
Enye i-hyperparameter ebalulekile yi-min_samples_split, echaza inani elincane kakhulu lokubonwa okufanele i-node ibe nakho ngaphambi kokuba ivunyelwe ukuhlukaniswa. Uma i-node inamasampula ambalwa kunale mingcele, iguqulwa ibe yiqabunga, ivimbele imodeli ukuthi ingalandeli umsindo ezinhlotsheni ezincane kakhulu zedatha.
Ungaphinda usebenzise i-Information Gain encane (min_information_gain) ukuze i-algorithm yenze ukwahlukana kuphela uma ikhiqiza intuthuko ebalulekile ekunciphiseni ukungcola. Lokhu kugwema ukudala amagatsha angadingekile angashintshi izibikezelo futhi amane enze isakhiwo somuthi sibe nzima.
Ukwakha umuthi wesinqumo kusukela ekuqaleni kukhodi
Ukusebenzisa umuthi wesinqumo kusukela ekuqaleni kuvame ukujikeleza kusethi encane yemisebenzi eyinhloko ebizwa ngokuthi iphindaphinda. Nakuba imitapo yolwazi efana ne-scikit-learn noma i-rpart yenza konke lokhu ngaphansi kwe-hood, ukubhala lezi zinyathelo ngokwakho kwenza ukuqonda kucace kakhulu (i-logic yokuhlela) futhi ikunikeza ukulawula okugcwele ukuziphatha.
Okokuqala, udinga inqubo, uma ubheka idatha yamanje ku-node, ehlola zonke izici kanye nomuntu ngamunye ohluliwe ukuze athole lowo othola ulwazi oluningi kakhulu. Lo msebenzi ubuyisela isici esikhethiwe, umthetho wokuhlukaniswa (umkhawulo noma isethi encane yezigaba), inani lokuzuza, kanye ne-boolean mask noma amasethi enkomba akhomba ukuthi yimaphi amasampula aya kwesobunxele nokuthi yimaphi aya kwesokudla.
Okwesibili, udinga umsebenzi wokubikezela wama-leaf node oguqula isethi yamanani aqondiwe kuleyo node ibe ukubikezela okukodwa. Ngokubuyela emuva, lokhu ngokuvamile kuyisilinganiso sika-y kuleyo nodi; ngokuhlela, uvame ukuthatha imodi (iklasi evame kakhulu), mhlawumbe ugcine amathuba ekilasi uma ufuna imiphumela engenzeka.
Okwesithathu, udala umsebenzi wokuqeqesha ophindaphindayo ohlola izindlela zokumisa, useshe ukuhlukaniswa okungcono kakhulu uma kuvunyelwe, bese wakha ama-node amancane ngokuzibiza ngokwawo kuma-subset angakwesokunxele nakwesokudla. Uma usayizi wesampula omncane, ukujula okuphezulu, noma izimo zokuzuza okuncane kungahlangatshezwana nazo, umsebenzi uyeka ukuhlukana futhi ugcine isibikezelo seqabunga esikhundleni samagatsha engeziwe.
Indlela ukubikezela okusebenza ngayo esihlahleni sesinqumo esiqeqeshiwe
Uma umuthi wakho usuqeqeshiwe futhi usuyigcinile yonke imithetho yokuhlukaniswa kanye nezibikezelo zamaqabunga, ukwenza isibikezelo sokubona okusha kumane nje kuyindaba yokuhamba phansi komuthi kusukela empandeni kuya eqabungeni. Ku-node ngayinye yangaphakathi, uhlola isici esidingekayo bese uhlola ukuthi ukubonwa kuyayanelisa yini isimo se-node.
Uma umthetho wokuhlukaniswa uyinombolo, uhlola ukuthi inani lesici lingaphansi noma lilingana nomkhawulo; uma umthetho wokuhlukaniswa ungowesigaba, uhlola ukuthi isigaba siku-subset ethile. Kuye ngomphumela, ulandela igatsha elifanele (isibonelo, “yebo” ngakwesobunxele, “cha” ngakwesokudla) bese uphinda le nqubo ku-node elandelayo.
Uyaqhubeka wehla esihlahleni uze ufike endaweni engenabantwana, okuyiqabunga eligcina inani lokukhipha elihlala likhona noma ilebula lekilasi. Kumuthi wokuhlehlisa, isibikezelo sizoba inombolo efana nesikhathi sokuphila esilinganiselwe noma ukusebenza kahle kukaphethiloli; kumuthi wokuhlukanisa, umphumela uzoba isigaba esibikezelwe njengokuthi “wasinda” noma “akasindanga”.
Uma uhlola le ndlela kudatha efanayo oyisebenzisile ekuqeqeshweni, uzovame ukubona ukunemba okuphezulu kakhulu kokuhlukaniswa (isibonelo, cishe u-85% kwezinye izibonelo ezilula zokukhuluphala noma zesitayela se-Titanic), kodwa lokho kusebenza kungase kwehle kudatha engabonakali uma umuthi wakho ujule kakhulu. Yingakho ukulawula ukujula nobukhulu besihlahla kubaluleke kangaka, nokuthi kungani amaqembu afana namahlathi angahleliwe asungulwa ukuze kuqiniswe ukubikezela kwesihlahla.
Ukusebenza ngezihlahla zokubuyela emuva ekusebenzeni
Izihlahla zokuhlehla ziwusizo kakhulu lapho ubudlelwano phakathi kwezibikezeli kanye nempendulo bungeyona eqondile futhi buhilela ukusebenzisana okunzima ukukulingisa nge-classic linear regression. Esikhundleni sokuzama ukulingana nesibalo esisodwa somhlaba wonke, isihlahla sihlukanisa isikhala sesici sibe yizifunda futhi sifanela imodeli elula engaguquki ngaphakathi kwesifunda ngasinye.
Ku-R, amaphakheji adumile afana ne-rpart kanye ne-tree enza kube lula ukwakha izihlahla zokubuyela emuva nge-function call eyodwa, ecacisa ifomula efana ne-y ~ x1 + x2 + … + x11. Lawa maphakheji athonywe yindlela yokuqala ye-CART echazwe nguBreiman kanye nozakwabo, futhi asebenzisa iningi lezindinganiso zemibono yokuhlukanisa nokuthena ekubumbeni kwesimanje okusekelwe ezihlahleni.
Isibonelo, ungasebenzisa iphakheji ye-rpart ukuze ubonise impendulo engu-y ngokusekelwe ku-covariates eziyishumi nanye u-x1 kuya ku-x11, uhlanze idatha yamanani angekho, bese ubona ngeso lengqondo umuthi ophumayo ngemisebenzi yomsizi efana ne-prp kusuka kuphakheji ye-rpart.plot. Ama-terminal node abonisa u-y obikezelwe wesifunda ngasinye, ongawusebenzisa ngqo ekubonweni okusha.
Uma unikezwe umuthi wokuhlehlisa oqeqeshiwe, ungondla amanani amasha e-covariate afana no-x9 = 70, x2 = 100 noma x9 = 60, x2 = 150 kumsebenzi wokubikezela ukuze uthole amanani alinganiselwe ŷ (isibonelo cishe u-20 noma u-28 esibonelweni sokusebenzisa uphethiloli). Ukuqhathanisa lezi zibikezelo namanani aqashelwayo, isibonelo ngokuhlobana phakathi kuka-y no-ŷ, kukunikeza umuzwa osheshayo wokuthi umuthi ubamba kahle kangakanani iphethini engaphansi, noma ngabe isethi yedatha incane impela.
Kusukela ezihlahleni ezizimele kuya emahlathini angahleliwe
Umuthi owodwa wesinqumo unamandla kodwa futhi uyaziwa ngokuzwela izici zedatha yokuqeqesha, okungaholela ekuhlukeni okukhulu (ubandlululo kanye nokwehluka) kanye nokufaka ngokweqile. Ukuze kuncishiswe lokhu, amahlathi angahleliwe akha imithi eminingi kumasampula edatha afakwe izintambo futhi ahlanganise izibikezelo zawo, akhiqize imodeli ezinzile futhi evame ukuba nembe kakhudlwana.
Ehlathini elingahleliwe, umuthi ngamunye uqeqeshwa ngesampula ye-bootstrap, okusho ukuthi isethi yedatha entsha enobukhulu obufanayo ithathwe kusethi yokuqeqesha yokuqala enokufakwa esikhundleni sayo. Le nqubo yokuthatha amasampula yenza umuthi ngamunye ubone isethi yedatha ehlukile kancane, ngakho-ke amaphutha awo awahlobene kakhulu futhi angakhanselwa uma ehlanganisiwe.
Ngaphezu kwalokho, amahlathi angahleliwe aletha ukungahleliwe enkambisweni yokukhetha izici ngokubheka kuphela isethi encane engahleliwe yezibikezeli ekuhlukanisweni ngakunye kunakubo bonke ababikezeli. Lokhu kunciphisa kakhulu ubudlelwano phakathi kwezihlahla, kuthuthukisa ukuhlukahluka ehlathini, futhi kuvame ukunciphisa ukuhlukahluka ngaphandle kokwandisa ukucwasa ngokweqile.
Inhlanganisela yokuhlanganisa izibikezelo nge-bootstrapping kanye nokuhlanganiswa kwezibikezelo yaziwa ngokuthi i-bagging, futhi emahlathini angahleliwe uthola nokulinganisa kwangaphakathi kwephutha lemodeli ngokuhlola umuthi ngamunye kumaphuzu edatha angafakwanga kusampula yawo ye-bootstrap (okubizwa ngokuthi ukubonwa kwe-out of bag). Leli phutha elingaphandle kwesikhwama linikeza indlela elula yokulinganisa ukusebenza ngaphandle kokudinga isethi yokuqinisekisa ehlukile.
Nakuba lesi sihloko sigxile ekwakheni umuthi owodwa kusukela ekuqaleni, ukuqonda ukuthi leyo ngxenye eyisisekelo isebenza kanjani kwenza kube lula kakhulu ukuqonda ukuthi amaqembu afana namahlathi angahleliwe, ukukhulisa i-gradient kanye nezinye izindlela ezisekelwe emithini akha kanjani ezimisweni ezifanayo ukuze kufezwe imiphumela yobuchwepheshe besimanje ezinkingeni eziningi ezisetshenziswayo.
Uma sihlanganisa konke, ukuhlehla kwesihlahla sesinqumo kusukela ekuqaleni kukukhombisa ukuthi isethi elula yemithetho, imisebenzi yezindleko kanye nokuhlukana okuphindaphindiwe kungabonisa kanjani ubudlelwano obuyinkimbinkimbi, kungakhathaliseki ukuthi ubikezela umphumela wesibili njengokusinda, ilebula elijwayelekile njengesimo sokukhuluphala, noma ithagethi yezinombolo njengokulindeleka kokuphila noma ukusetshenziswa kukaphethiloli, futhi lokhu kuqonda okujulile kuba yisisekelo esiqinile sokusebenzisa amasu asekelwe esihlahleni athuthukile ekusebenzeni.