Auxiliary variable
Classification
Intermsoftheirnature,auxiliaryvariablescanbedividedintothefollowingcategories:
1.Informationreflectingtheoverallstructure.Forexample,theoverallsamplingunitcanbedividedintoseverallayersaccordingtothedegreeofdifferenceinthevalueofthesurveyindicators.Thecompositionofeachlayeranditsproportionintheoverallinformationreflecttheoverallstructure.
2.Informationonthescale.Forexample,whenthepopulationisdividedintosamplingunitsofdifferentlevels,toknowthenumberofsamplingunitsofacertainlevel,youcanusethenumberofsamplingunitsofthenextlevelcontainedinitorothermeasurementvaluestoindicatethesizeorsizeofthissamplingunit.
3.Itistheinformationofauxiliaryindicatorscloselyrelatedtotheinvestigatedindicators.Forexample,wheninvestigatingtheconsumptionexpenditureofresidents,thedisposableincomeofresidentsisanauxiliaryindicatorthathasahighcorrelationwithexpenditure.Theinformationofthisauxiliaryindicatorcanbetheoverall,samplingunitandsample.
4.Relevanthistoricalinformationofsurveyindicators.Forexample,insomeregularsamplesurveys,theprevioussurveyindicatorsareoftenusedasauxiliaryindicatorsforthecurrentsurveyindicators,andsometimestherecentcensusindicatorsofthesamesurveyitemcanbeusedasauxiliaryindicatorsforthecurrentsamplingsurveyindicators.Inthiscase,auxiliaryindicatorsareprovided.Theinformationisthehistoricalinformationofthesurveyindicators.
Role
Theroleofauxiliaryvariablesismainlyreflectedintwoaspects:oneistoimprovethesamplingmethodandincreasetherepresentativenessofthesampletothepopulation.Thesecondistoimprovetheestimationmethod,reducetheestimationerror,andimprovetheestimationaccuracy.Forthefirstaspect,therearemainlythefollowingthreemodesofaction.
1.Theroleofauxiliaryvariablesinstratifiedsampling.
Stratifiedsamplingisasamplingmethodthatisfrequentlyusedinactualwork.Theefficiencyofstratifiedsamplingmainlydependsonthevariancewithintheoverallstrataandthevariancebetweenstrata.Usingauxiliaryvariablestostratifythepopulationcaneffectivelyreducethevariancewithinthelayerandexpandthevariancebetweenthelayers.Reducetheratioofthevariancewithintheoverallstratatothevariancebetweenstrata,therebygreatlyimprovingtheefficiencyofsamplingestimation.
2.Theroleofauxiliaryvariablesinsystematicsampling.
Symmetricandequidistantsamplingissuitableforthesamplingdesignofthelineartrendpopulation,butmostofthepopulationswefaceinactualworkarenon-linear.Usingauxiliaryvariablestoranktheresearchpopulationcanbeusedtosortthenon-linearOveralllineartrending,andimplementingsymmetricandequidistantsamplingonthisbasisisaneffectivewaytoimprovesamplingefficiency.
3.Usetheauxiliaryvariableratioasthesamplingprobabilityofthesamplingunittoimplementunequalprobabilitysampling.
Inparticular,unequalprobabilitysamplingwithclustersasthesamplingunitcaneffectivelyeliminatethe"numberleveleffect"and"ratenumbervariabilityeffect"ofequalprobabilityclustersampling,andimprovetheefficiencyofclustersampling.Usingauxiliaryvariablescanimplementunequalprobabilitysamplingtoimprovesamplingdesign.
Therearealsomorecommonapplicationsforthesecondaspect,forexample,theuseofauxiliaryvariablestoestablishratioestimatorsandregressionestimators.Constructingaratioorregressionestimatorforthepopulationmeanortotalvalueisanimportantmeanstoimprovethesamplingdesignfromtheestimationprocess,butbothoftheseestimationmethodsrequirecorrespondingauxiliaryvariables,andthepopulationmeanortotalvalueoftheauxiliaryvariablesneedstobeobtained.Inaddition,inadditiontotheabovetwoauxiliaryvariables,auxiliaryvariablescansometimesbeusedtodealwithmissingdatainsamplesurveys.
Usingauxiliaryvariablesforsamplingdesigncangreatlyimprovetheaccuracyofestimation.Therefore,atthebeginningofsamplingdesign,weshouldfocusoninvestigatingwhetherthereisauxiliaryvariabledataforselectionanduse,andfurtherconsiderhowtouseitbetter.
Use
Usingauxiliaryvariablesforsamplingdesignhastwopurposes:oneistoimprovethesamplingmethod,andtheotheristoimprovetheestimator.Theninthesamplingsurvey,duetothedifferenceoftheestimateandthesamplingmethod,theauxiliaryvariablemustbeusedaccordingtothesituation.
1.Useauxiliaryvariablestostratifythepopulation
Stratifiedsamplingisasamplingmethodthatisfrequentlyusedinactualwork.Itissuitablefortwoortwoauxiliaryvariables.Morethanonesituation.Theefficiencyofstratifiedsamplingmainlydependsontheratioofthevariancewithintheoverallstratatothevariancebetweenstrata.Usingauxiliaryvariablestostratifythepopulationcaneffectivelyreducetheintra-layervariance,expandtheinter-layervariance,andreducetheratiooftheoverallintra-layervariancetotheinter-layervariance,therebygreatlyimprovingtheefficiencyofsamplingestimation.Thismethodcanbeusedtodeterminetheoptimalstratificationpoint,sothatthevarianceofthetargetquantityestimateisthesmallestunderthesamecircumstances.
2.Useauxiliaryvariablestoqueuetheoverallunits,andthenperformsystematicsampling
Thisisasystematicsamplingofqueuingaccordingtorelevantsigns.Sincetheaccuracyofsystematicsamplingdependsonthevariancewithinthesample,thegreaterthevariancewithinthesample,thesmallerthevarianceofthetargetquantityestimate.Queuingbyauxiliaryvariablescantrendthenonlinearpopulationlinearly.Onthisbasis,theimplementationofsystematicsamplingcanincreasetheintra-samplevarianceoftheobtainedsystem,therebyimprovingtheaccuracy.Theoperationofthismethodissimpleandconvenient,andtheeffectisgenerallyideal.Thedisadvantageisthatitisdifficulttoestimatethevarianceoftheestimator,andbecausethequeuingonlyusestheinformationofthesizeorderoftheauxiliaryvariables,theinformationutilizationisnotverysufficient.
3.Usingauxiliaryvariablesforunequalprobabilitysampling
Unequalprobabilitysampling,especiallytheunequalprobabilitysamplingwiththegroupunitasthesamplingunit,caneffectivelyimprovetheclustersamplingsefficiency.Whenusingauxiliaryvariablesforunequalprobabilitysampling,practicalworkersoftenusethefollowingtwomethods:replacementPPSsamplingandPPSsystematicsampling.Theformersampling,estimatorandvarianceestimationareextremelysimple,buttheaccuracyisslightlyworse;whilethelatterasasystematicsampling,varianceestimationismoredifficult.
4.Theuseofauxiliaryvariablestoestablishratioestimators,andregressionestimatorstoconstructratiosorregressionestimatorstotheoverallmeanortotalvalueareimportantmeanstoimprovesamplingdesignfromtheestimationprocess.
Bothestimationmethodsrequirecorrespondingauxiliaryvariables,andobtaintheoverallmeanortotalvalueoftheauxiliaryvariables.Ratioestimationandregressionestimationareusuallyextremelyeffectivewhentheauxiliaryvariablesarehighlycorrelatedwiththesurveyindicators.Especiallyso.Oneofitsgreatadvantagesisthatitcanbeusedinmulti-indexsituations,inwhichdifferentindexvaluesoftenusedifferentauxiliaryvariables.Thedisadvantagesoftheestimationorregressionestimationarethatthecalculationismorecomplicated,andtheestimatorisbiased.However,whenthesamplesizeisrelativelylarge,thebiasoftheestimatorisnotlarge.Inthecaseofalargesample,biasaccountsforonlyasmallpartofthetotalmodelerrorrelativetothevariance.
5.Useauxiliaryvariablesforpost-layering.
Sometimesitisdifficulttostratifyinadvance.Withoutastratifiedsamplingframe,stratifiedsamplingcannotbecarriedoutandstratifiedsamplescannotbeobtained.Ifyouwanttousethebenefitsofstratifiedsamplingtoimproveaccuracyandgetanestimateofeachsub-populationatthistime,youmustusethepost-stratificationtechnique.Oneoftheprerequisitesfortheadoptionofexpoststratificationisthatthestratificationrightscanbeobtainedthroughsomemeansandthusareknown.Subsequentstratificationhaslessdemandforauxiliaryinformation.Itdoesnotneedtograsptheinformationofeachunitofthewhole,butonlyneedstounderstandsomekindofsummaryinformation;therefore,thecostislowerandtheeffectispoor.Itisnolongervalid.
Choice
Atthebeginningofthesamplingdesign,weshouldfocusoninvestigatingwhetherthereareauxiliaryvariablesthatcanbeused,andfurtherconsiderwhatvariablescanbeusedasauxiliaryvariables,whichauxiliaryvariablesarebetter,andhowChoosetheappropriateauxiliaryvariablefromalargenumberofvariables.
1.Waystoobtainauxiliaryvariables
1)Historicaldata
Statisticalsurveysgenerallyuseacombinationofperiodiccensusandregularsamplingsurveysmethod.Periodiccensusesprovidealargeamountofhistoricaldataforsamplesurveysduringthetwocensuses.Itincludeshistoricaldataofthesurveyvariablesthemselvesandothervariablesrelatedtothesurveyvariables.Thesedatagenerallyhavelowacquisitioncostsanddataaccuracy.High,ithastheadvantagesofhighcorrelationwithresearchvariables.
2)Currentrelateddata
Someresearchvariableshaverelateddataofthesameperiod.Forexample,policestationsgenerallyhaverelativelycompletedemographicdatawithintheirjurisdiction;administrativedepartmentsforindustryandcommercehaverelativelycompletebusinessregistrationdatawithintheirjurisdiction.Thesedatahavecorrelationswithmanyvariables,andcanbeusedasauxiliaryvariablesfortheresearchvariablesinthesamplingdesign.
3)Pilotsurvey
Ifoveralldataisnotavailable,apilotsurveycanbeconsideredtoobtaintheauxiliaryvariabledatarequiredbythesamplingdesign.Samplesforexperimentalsurveyscanbedrawneitherbyprobabilitysamplingmethodsornon-probabilitysamplingmethods.
4)Doublesampling
Ifthereisnoconvenientauxiliaryvariable,youcanalsoconsiderusingthedoublesamplingtechnique.First,drawarelativelylargesimplerandomsamplefromthepopulationtoestimatetheauxiliaryvariable.Then,takethissampleasasmallpopulation,andcarryoutthesamplingdesignofthenecessaryauxiliaryvariables.Underthecircumstancethatthetotalsurveycostisconstant,takingandsurveyingthefirstre-samplewillofcoursecostapartofthecost,sothesamplesizeofthesecondre-samplehastobereduced.Thecostofthefirstre-samplingistoobtaintheinformationofauxiliaryvariablesinordertoimprovetheestimationaccuracyofthesecondre-sampling;thesecondre-samplingwillalsocausethelossofestimationaccuracyduetohavingtoreducethesamplesize.Comparingthetwo,onlytheformer'sgaininimprovingaccuracyexceedsthelatter'slossinreducingaccuracy,theuseofdoublesamplingtechnologyismeaningful.
2.Theprincipleofselectingauxiliaryvariables:theprincipleofbesteffect
Samplingsurveyshaveaninput-outputproblem.Theinputisthesurveyfunding,andtheoutputistheestimatedvalueoftheoverallindex,Andusethesamplingaccuracytomeasure.Inthisway,thebesteffectprinciplecanbedecomposedintotwoaspects:thehighestsamplingaccuracyprincipleandthelowestsurveycostprinciple.
Theformerrequiresthatwhenselectingauxiliaryvariables,considerationshouldbegiventoadaptingtheselectedauxiliaryvariablestotherequirementsofsamplingandestimationmethodsinordertoachievethehighestsamplingaccuracy.Thisisbecausedifferentsamplingandestimationmethodshavedifferentrequirementsfortherelationshipbetweenauxiliaryvariablesandresearchvariables.Forexample,theauxiliaryvariablesusedforstratificationandregressionestimationshouldhaveahighlinearcorrelationwiththeresearchvariables.;TheauxiliaryvariablesusedforPPSsamplingandratioestimationshouldhaveaproportionalrelationshipwiththeresearchvariables.Thelatterrequiresthatwhenselectingauxiliaryvariables,considerationshouldbegiventothelowestcostunderagivensamplingaccuracy.Thisisbecausethecostofobtainingthevalueofdifferentauxiliaryvariablesisverydifferent.Someauxiliaryvariablesmaybeobtainedatarelativelylowcost,whilesomeauxiliaryvariablevaluesmaybeobtainedataconsiderablecost.Auxiliaryvariablesarevariablesintroducedtoimprovetheaccuracyofestimation.Underthecircumstanceofcertainsurveyexpenses,themoreexpensesoccupiedintheinvestigationofauxiliaryvariables,thelesssurveyexpensesleftforthesample,whichwillaffectthesample.capacity.
Auxiliaryvariablemethod
(auxiliaryvariablemethod)
Auxiliaryvariablemethodisalsocalledinstrumentalvariablemethod.Animprovedleastsquaresparameterestimationmethod.
Suppose
wherey(t)isthemeasurement,φ(t)istheregressionvector,θ0Istheparametervectortobeestimated(Tstandsfortransposition),v(t)isthecorrelatednoise,thentheleastsquareestimation
doesnotconvergetothetruevalueθ.Inthiscase,anauxiliaryvectorz(t)canbeused,aslongas:
1,z(t)andv(t)arenotrelatedtoeachother,thatis,
2.Thematrixisinvertible.
3,themeanofv(t)iszero.
Itcanbeprovedthatwhen,theauxiliaryvariableestimator
willtendto(ieconvergeand)thetruevalueθ0.
Thekeytotheauxiliaryvariablemethodishowtoconstructanauxiliaryvariablethatsatisfiestheaboveconditions,thatistosay,theauxiliaryvariableshouldnotbecorrelatedwithnoisev(t),butstronglycorrelatedwithφ(t).Therefore,acommonchoiceistohaveinputtostimulateadeterministicsystemtoproduceoutputandinputcomponents,andtheotheristotakealldelayedinputsasauxiliaryvariables.Thebasicideaoftheauxiliaryvariablemethodisveryvaluable.Appropriateselectionofauxiliaryvariablescancommunicatewithotheridentificationmethods.Therefore,itcanbeusedinmanyoccasions.Ofcourse,thereisalsotherecursiveauxiliaryvariablemethod.Itisalsowidelyused.use.
Latest: Deep learning
Next: Network bandwidth