word embeddings for prodigy train recipe

fsa · October 11, 2022, 9:08am

I am using currently prodigy V1.11.7 and I wanted to integrate word embeddings in the prodigy ner train workflow. I trained my own word embeddings using Gensim and convert it to spacy format. However, it seems there are some changes in the train recipe workflow where I couldn't find the right argument for this. Here as a simple eample the command line I use for training my ner model:

prodigy train output_dir --eval-split 0.3  --ner trainng-instances

koaning · October 11, 2022, 12:40pm

I think the first step is to create a new spaCy model locally that has the vectors that you are interested in, which you can achieve with the spacy init vectors command, documented here.

This will output a new spaCy model in an output dir, which can be the model that you're referring to when you're training. The command would be something like:

python -m prodigy train --base-model <custom_model_with_vectors>

Have you tried that? If that does not work let me know.

fsa · October 12, 2022, 11:09am

Thanks for your reply, actually this what I did before. After creating the vectors using the init vector I trained my model like this:

prodigy train path/model/ --eval-split 0.3  --ner db-trainng-instances  --base-model  /spacy_vector

The issue, I have (exactly) the same model scores etc. as without using any vector:

E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE

So it seems no change at all by including the vectors.

here part of the vectors:

particles 0.4804788 -0.05594982 0.2335679 0.03280474 0.14610262 -0.5181907 0.043630116 0.72091913 -0.18787031 -0.07192707 -0.051031433 -0.47189796 0.1222314 0.55775833 0.19508147 -0.10757751 0.196269 -0.02996382 0.08018385 -0.46378562 0.016449455 0.13515268 0.09976511 -0.16613238 -0.041277062 -0.45512462 -0.221075 0.019454222 -0.5011662 -0.20026545 -0.032410733 -0.0758985 0.27809778 -0.10409279 -0.3178583 0.16464783 0.5721596 -0.038828924 0.15544403 -0.36481804 -0.4121225 0.25873843 0.33684617 0.17781106 0.20690094 0.19044667 -0.09252856 0.18129478 -0.059786376 0.65943885 0.04097364 0.23692924 -0.66190994 -0.46083036 0.18794882 -0.41608867 0.5639839 -0.010056049 -0.3535567 0.098289765 -0.29499274 -0.084466554 -0.28534502 0.049239397 -0.31835687 0.28270656 0.08576716 0.5196469 -0.5755539 0.24707189 -0.059217878 -0.12993646 0.6008675 0.1474945 0.06681337 -0.14885437 0.3647617 -0.8002251 -1.2604003 0.3930639 -0.042223137 0.08468857 -0.43872628 0.6093486 0.22972783 0.14501119 -0.07249509 0.42015824 0.13299756 0.07498503 0.17184411 0.45523116 -0.04666958 0.6790042 0.6627506 0.37810105 0.24796024 -0.18260257 0.37313017 0.3192301 -0.48908815 0.39992052 0.4097123 0.016506957 -0.15241456 -0.33085728 -0.0248391 0.48107472 0.10695161 -0.019698858 -0.20521197 -0.708829 -0.4756142 -0.25053647 0.20118275 0.1966578 0.57817125 -0.5376493 -0.21896558 -0.17614634 0.17084484 0.74203587 0.25111264 -0.27785966 0.25464764 0.38463384 -0.3163028 0.036928587 -0.27897817 -0.17993231 0.08650193 0.09433548 -0.10858458 -0.17429388 -0.17480558 0.02215234 -0.17101875 -0.5456922 0.012383579 -0.5056778 0.716481 -0.034691054 -0.34700608 -0.30156925 -0.16642126 0.09866226 -0.4545445 0.10481287 0.07164561 0.37512305 -0.39773396 -0.7033261 -0.24331 0.041392494 -0.24856324 0.715326 0.065802746 0.5793582 0.103840455 0.3267412 0.089591034 0.22175121 -0.33910644 -0.28232524 -0.17644891 -0.04488885 0.29670262 -0.11736181 -0.22832678 0.6800392 -0.18903194 0.16326416 0.039699677 -0.9324969 -0.017583966 -0.19750485 -0.32040805 0.17009251 -0.004254267 -0.24806198 -0.2891058 0.10755589 -0.0965101 -0.026366763 -0.13786235 0.27251622 -0.2964755 -0.07876515 0.7640334 0.5173414 0.336458 -0.42699835 -0.07030051 -0.30719852 -0.53423005 0.24429631 0.4078179 -0.051308952 -0.4062433 0.19466023
polymer 0.4631667 -0.032146703 0.21268019 -0.00025460962 0.15237015 -0.46448952 0.062006813 0.7035147 -0.20292218 -0.07084764 -0.0820232 -0.5002094 0.10619503 0.54801965 0.1559642 -0.1230077 0.20016085 0.014389996 0.061857883 -0.4631932 0.022611104 0.11100376 0.134917 -0.15735255 -0.06157306 -0.45092788 -0.22434025 -0.0064923516 -0.47749135 -0.23268092 -0.035909917 -0.043004125 0.28861827 -0.13442792 -0.29995948 0.15699182 0.61092365 -0.014569416 0.1325709 -0.3662516 -0.4006491 0.25922573 0.31621087 0.1797699 0.21493548 0.18616652 -0.0890226 0.1320619 -0.05526667 0.6975974 -0.016926 0.22699767 -0.66049534 -0.44142154 0.18290761 -0.45866048 0.54311955 -0.026213424 -0.39882314 0.087652735 -0.28606355 -0.095258035 -0.24662672 0.06022812 -0.3327319 0.24247758 0.06781211 0.5191309 -0.5770635 0.24312308 -0.025535353 -0.073556274 0.63490653 0.1924518 0.022159608 -0.1737116 0.41085342 -0.8444233 -1.2476152 0.4218144 -0.0675822 0.04571777 -0.4197529 0.5689369 0.18418083 0.08133971 -0.047470197 0.38485703 0.09834527 0.08035277 0.1549145 0.47936234 -0.006755924 0.6556614 0.70796335 0.34931454 0.28838778 -0.23003854 0.38355678 0.3323061 -0.5452363 0.40752044 0.42591795 -0.013773411 -0.12009474 -0.3913103 -0.033888068 0.47864792 0.052990597 -0.085540354 -0.17809172 -0.73925024 -0.46632317 -0.23744178 0.20781894 0.18352643 0.583919 -0.5671779 -0.17026609 -0.19203845 0.13229232 0.70947134 0.24860813 -0.29708052 0.20752488 0.37022978 -0.2666993 0.048586667 -0.31031385 -0.1834922 0.099597596 0.046815787 -0.13810425 -0.20542185 -0.21362528 0.029500512 -0.21801612 -0.561614 0.04638817 -0.53335667 0.7684348 -0.049564574 -0.2946919 -0.2898092 -0.18008518 0.10074101 -0.48876548 0.058007453 0.0594113 0.3401847 -0.3739116 -0.6625117 -0.2199669 0.076754645 -0.236365 0.7118279 0.082736954 0.58712995 0.09302523 0.29764822 0.09921083 0.22310807 -0.3912202 -0.27700648 -0.17828386 -0.06505832 0.29610237 -0.13157786 -0.20394248 0.66471964 -0.2169998 0.15168263 0.053484377 -0.9120152 0.014252865 -0.19443382 -0.3288243 0.2065651 -0.017054236 -0.2345031 -0.31454918 0.16376843 -0.062411573 0.0027292494 -0.13199638 0.31119186 -0.3377177 -0.050882626 0.8393009 0.5356609 0.34144586 -0.42382747 -0.10889544 -0.3018954 -0.46790707 0.27072877 0.3662983 -0.08278911 -0.40985462 0.18174662
containing 0.39172387 -0.099800676 0.2035135 0.034410715 0.25900814 -0.37924802 0.050394055 0.80067366 -0.22665404 -0.024416247 -0.11173184 -0.53302264 -0.039817654 0.703761 0.082365125 -0.11964779 0.043012712 0.02522047 0.08519709 -0.66794276 0.1263293 -0.0019721177 0.24012929 -0.023213752 -0.06664761 -0.3685452 -0.2463073 -0.05183679 -0.5877398 -0.28776166 0.07226026 0.0023660457 0.22269094 -0.25611466 -0.17339897 0.18478644 0.59122443 0.04411711 -0.008826122 -0.43640044 -0.45734456 0.13169199 0.14316228 0.2960837 0.3184288 0.11463812 -0.11628918 0.108795546 0.019639973 0.70584446 -0.003968828 0.17751051 -0.5860209 -0.4346272 0.2051689 -0.48739105 0.46892497 -0.17573008 -0.5502872 -0.013271365 -0.2720588 -0.12523206 -0.14762597 0.15161501 -0.46546233 0.2868476 -0.015519745 0.7220606 -0.56278336 0.3359964 -0.043929845 0.112998925 0.601318 0.19533001 0.04080642 -0.2175204 0.42504337 -0.8480979 -1.1334879 0.33656114 -0.13832538 -0.09962775 -0.484292 0.5180827 0.024087464 -0.033580497 -0.052019786 0.37153324 -0.050357413 0.020099234 0.19492194 0.53166246 0.039257832 0.6990267 0.7781067 0.36347243 0.3205743 -0.2348442 0.3777832 0.40612873 -0.6425398 0.46569005 0.46795747 -0.048790436 -0.15739241 -0.5085546 0.0095487125 0.50581616 -0.03399802 -0.19224349 -0.21486266 -0.87296796 -0.48969415 -0.25052413 0.22785081 0.044814225 0.54961795 -0.7174117 -0.16830257 -0.30795577 0.058682356 0.68563837 0.24588795 -0.34739333 0.118067496 0.3160265 -0.29302597 0.013792204 -0.2890325 -0.14951512 0.17549525 0.032973997 -0.2701399 -0.37510064 -0.21062788 0.20649052 -0.19168833 -0.5398851 0.089598514 -0.6380788 0.895415 -0.16449355 -0.37147412 -0.2801606 -0.21681891 0.020197501 -0.5996839 0.1108204 0.049324606 0.38866886 -0.24305628 -0.6631721 -0.077398606 0.16537271 -0.27118897 0.7552495 0.17065515 0.51631373 0.0038703864 0.31338307 0.16674587 0.22770967 -0.34263203 -0.34481537 -0.13115144 -0.059751667 0.36764592 -0.093568295 -0.16597377 0.5639921 -0.1639518 0.12286351 0.1849309 -0.8025869 0.11601649 -0.07189684 -0.34815243 0.32403642 0.09697229 -0.17506872 -0.2700269 0.21556987 0.088985294 -0.14024381 -0.09337527 0.28630814 -0.3266763 0.12743072 0.9608064 0.54036564 0.52165556 -0.26467612 -0.07996256 -0.32518637 -0.30403548 0.38863552 0.3574567 -0.1922084 -0.2651588 0.10406696
20 0.1666284 -0.22832808 0.13515484 0.10207683 0.4846564 -0.0473741 0.057744216 0.82316023 -0.20718849 0.13760696 -0.16953935 -0.52919704 -0.40222347 0.996602 -0.22094601 -0.071670674 -0.36477497 0.018403657 0.13926925 -1.0913974 0.38119942 -0.20810242 0.4992888 0.311151 -0.11865768 -0.08984897 -0.23505588 -0.15726355 -0.6418331 -0.4074838 0.4442784 0.17990017 0.048312336 -0.47516355 0.11599715 0.22124429 0.4416866 0.23130922 -0.34883648 -0.54387945 -0.42471212 -0.21678585 -0.37880626 0.56451374 0.5607106 -0.056834176 -0.24395253 -0.019868959 0.22606273 0.5655737 -0.07698852 0.0148295155 -0.3222098 -0.34080857 0.28138244 -0.4484932 0.18174659 -0.5664744 -0.9486975 -0.31331363 -0.16635692 -0.15657285 0.15763378 0.3382611 -0.6771962 0.28670183 -0.2546062 1.0658449 -0.36800075 0.50838363 -0.039031733 0.6051895 0.38345024 0.22552732 0.04681141 -0.33045587 0.4781176 -0.72668207 -0.58836824 0.08668736 -0.27055004 -0.49610367 -0.6468543 0.29149017 -0.40213585 -0.389586 -0.059511214 0.20480463 -0.46195474 -0.12678647 0.21786852 0.5994797 0.14152393 0.6426296 0.7966896 0.31513178 0.3432426 -0.18235318 0.28105116 0.5622372 -0.6735996 0.5390409 0.44242796 -0.17026187 -0.105390385 -0.71779215 0.114772856 0.46774372 -0.30101538 -0.39723223 -0.23041387 -1.0721859 -0.41124922 -0.20790261 0.25376946 -0.30407688 0.39793 -0.9476227 -0.06959033 -0.566772 -0.12593517 0.49509645 0.13663353 -0.5011223 -0.19793276 0.09098234 -0.29291812 -0.10910287 -0.15072778 -0.07746587 0.3558633 -0.030143674 -0.513918 -0.7900933 -0.15388429 0.5421765 -0.18456152 -0.328581 0.21967694 -0.7589977 1.071599 -0.46605533 -0.42589396 -0.16547659 -0.2520748 -0.14538151 -0.8172122 0.14738756 -0.011931934 0.42323872 0.12607816 -0.45240596 0.35251337 0.4080283 -0.258912 0.7028004 0.36786893 0.23374875 -0.1582569 0.2683612 0.27871233 0.18749282 -0.21686307 -0.44741246 0.0794151 -0.046471942 0.44191357 0.011230037 -0.040042818 0.2226173 0.00346024 0.019534614 0.49454972 -0.35601738 0.29363686 0.14683206 -0.3063735 0.62903446 0.3506764 0.038305346 -0.19501361 0.30755368 0.47004828 -0.44778425 0.04209524 0.23940365 -0.21631321 0.54989034 1.1787096 0.4341819 0.80079466 0.17779765 -0.002217783 -0.33144307 0.15840359 0.6481409 0.2365382 -0.4233712 0.18317635 -0.11365237
composition 0.48069876 -0.047196 0.22591099 0.004280272 0.17504771 -0.4903772 0.059306066 0.73581994 -0.2153244 -0.08165262 -0.071042314 -0.51382846 0.10302711 0.57047474 0.16579579 -0.14301401 0.20429921 0.0036271953 0.051398125 -0.49451163 0.045437723 0.09334163 0.13649684 -0.15623747 -0.050706696 -0.45762986 -0.23472214 -0.003577067 -0.5138736 -0.23636177 -0.035967056 -0.057210565 0.2696575 -0.15440322 -0.30534858 0.17746642 0.6211172 -0.025864366 0.121767625 -0.3777287 -0.4204421 0.25184375 0.31755114 0.19164273 0.22478487 0.18800981 -0.08912219 0.1464092 -0.036568604 0.740349 -0.0017770863 0.2104352 -0.6710401 -0.44135758 0.18571363 -0.4489975 0.5440117 -0.045943227 -0.43100965 0.07528914 -0.31431916 -0.08759237 -0.23817381 0.06017575 -0.3638478 0.2781202 0.07730234 0.5588225 -0.59635526 0.26135296 -0.021955386 -0.07339677 0.6530727 0.19106856 0.0374192 -0.16875929 0.39870027 -0.8531728 -1.2745563 0.42048958 -0.07459711 0.028298248 -0.43582365 0.6024089 0.17702255 0.12124422 -0.06814011 0.39177686 0.1028972 0.10298683 0.16840155 0.4919602 -0.019377088 0.67648566 0.73157483 0.36076048 0.28572243 -0.25592396 0.39794463 0.34617916 -0.5621951 0.4272675 0.4431664 -0.013350805 -0.14219336 -0.41232926 -0.020197567 0.49335477 0.06642273 -0.0732504 -0.20688933 -0.75826466 -0.47336116 -0.2492151 0.21429951 0.18476671 0.6073449 -0.5897583 -0.19752204 -0.2075393 0.14651686 0.74772525 0.24583504 -0.3018829 0.21858925 0.388713 -0.28936845 0.07030553 -0.3031962 -0.20537037 0.0884528 0.06857919 -0.16714025 -0.21659656 -0.21056299 0.0474472 -0.21291757 -0.5981719 0.03727524 -0.5482761 0.7754471 -0.06257082 -0.321036 -0.28634268 -0.1919649 0.110879466 -0.50463086 0.08954686 0.053780515 0.35890546 -0.3757888 -0.68869805 -0.21111858 0.06985457 -0.24995981 0.73962265 0.10206044 0.6255967 0.08768731 0.31962645 0.099893905 0.22815453 -0.39658898 -0.28766233 -0.17966166 -0.04163855 0.31438807 -0.12526865 -0.2142823 0.6762004 -0.20778085 0.16348481 0.06717438 -0.9384041 0.020139579 -0.18716773 -0.33702612 0.22083637 0.00891597 -0.26530707 -0.3303421 0.15779936 -0.051812295 -0.009121555 -0.123949245 0.31670743 -0.3267753 -0.0466221 0.8575694 0.5411063 0.38491553 -0.42661637 -0.10188892 -0.30328482 -0.4980842 0.2679598 0.3805854 -0.08734813 -0.39500678 0.18927084
material 0.4685299 -0.04601093 0.216661 -0.003260686 0.16945855 -0.4626479 0.018546252 0.7454194 -0.20331164 -0.059764795 -0.08770868 -0.49729016 0.08965542 0.5311146 0.17060779 -0.13466404 0.21762308 0.017091526 0.081526466 -0.4570666 0.018486878 0.09215188 0.12475749 -0.13999628 -0.043168075 -0.47291374 -0.24470451 -0.0076302034 -0.5288321 -0.2193674 -0.06818208 -0.072620444 0.28505787 -0.1715332 -0.28282726 0.15186408 0.60720366 -0.038026217 0.15947355 -0.37617645 -0.4108581 0.24906276 0.33309388 0.19431533 0.21624732 0.19002503 -0.07475928 0.15851535 -0.03491148 0.71381 0.0028972686 0.2511235 -0.66741246 -0.45841968 0.16675934 -0.4854139 0.54595196 -0.015257233 -0.37895855 0.10324939 -0.30127686 -0.08790206 -0.26601952 0.06588618 -0.34673128 0.23998569 0.10882227 0.55110955 -0.60037756 0.2645427 0.005736077 -0.080546975 0.6540034 0.1746365 0.038270384 -0.17567934 0.40501493 -0.8358105 -1.2758677 0.3888178 -0.07764182 0.042663522 -0.41465145 0.57648534 0.19729082 0.103898965 -0.059910513 0.41067407 0.09307142 0.09273307 0.16211866 0.46770558 -0.012112508 0.6623941 0.731067 0.35973835 0.27539855 -0.23383716 0.37897834 0.33748809 -0.5943726 0.4235246 0.43671829 -0.040627886 -0.14511776 -0.38551605 -0.03135979 0.4965095 0.06353536 -0.06939628 -0.21234614 -0.75244844 -0.49653006 -0.2681296 0.2227059 0.18211603 0.57450956 -0.5534509 -0.17228553 -0.19818561 0.16021866 0.7357613 0.27853066 -0.25815654 0.22199199 0.3957228 -0.2528765 0.025924528 -0.30018076 -0.16698442 0.09284181 0.049844563 -0.14663063 -0.20205203 -0.21154778 0.08855302 -0.20363401 -0.58666784 0.07214769 -0.550069 0.77591753 -0.060144767 -0.32375085 -0.2942101 -0.19534156 0.12496486 -0.4647117 0.0898732 0.04512675 0.29736045 -0.3885307 -0.69241285 -0.2308646 0.048801925 -0.24234405 0.73750633 0.106185876 0.596816 0.089981936 0.31115174 0.10559044 0.22788422 -0.40026367 -0.28789645 -0.18171947 -0.042349927 0.29703537 -0.14017601 -0.19823724 0.64917874 -0.22154817 0.14241032 0.048065905 -0.92564046 0.008778667 -0.16628247 -0.32684675 0.20195837 -0.0075630997 -0.23456962 -0.2820004 0.12674819 -0.06480637 0.00042494806 -0.13563609 0.3108999 -0.34071317 -0.07409325 0.81173754 0.53714395 0.37670365 -0.43580827 -0.11050771 -0.31116918 -0.4813995 0.25433898 0.37574407 -0.09289342 -0.40653422 0.15397781

koaning · October 14, 2022, 9:30am

I tried reproducing a whole custom vectors setup locally but did not reproduce your error. For completeness, I will share all of the steps.

Start

I started by annotating some data. These are my examples.jsonl:

{"text": "hi my name is vincent"}
{"text": "hi my name is john"}
{"text": "hi my name is jenny"}
{"text": "hi my name is noa"}

I've annotated these via:

python -m prodigy ner.manual issue-6020 blank:en examples.jsonl --label name

It's a very basic dataset, but it'll do.

Custom Vectors

I created a text.txt file with the following content to train embeddings.

this file contains some text
not a whole lot
just enough to provide a demo

To train some embeddings I figured I'd use floret. So I install it first.

python -m pip install --upgrade pip 
python -m pip install floret

And then I train it. I used the script found here. This is train_floret.py.

import typer
from pathlib import Path
import floret


def main(
    input_file: Path,
    output_stem: str,
    mode: str = "floret",
    model: str = "cbow",
    dim: int = 300,
    mincount: int = 10,
    minn: int = 5,
    maxn: int = 6,
    neg: int = 10,
    hashcount: int = 2,
    bucket: int = 20000,
    thread: int = 8,
):
    floret_model = floret.train_unsupervised(
        str(input_file.absolute()),
        model=model,
        mode=mode,
        dim=dim,
        minCount=mincount,
        minn=minn,
        maxn=maxn,
        neg=neg,
        hashCount=hashcount,
        bucket=bucket,
        thread=thread,
    )
    floret_model.save_model(output_stem + ".bin")
    floret_model.save_vectors(output_stem + ".vec")
    if mode == "floret":
        floret_model.save_floret_vectors(output_stem + ".floret")


if __name__ == "__main__":
    typer.run(main)

And I trained my embeddings via:

python train_floret.py text.txt vectors --mincount 1

This generates a vectors.floret file locally, which I can use to bootstrap a spaCy pipeline with vectors.

python -m spacy init vectors en vectors.floret --mode floret custom_model

This creates a folder called custom_model.

Training Models

I will now train two models. One based off the custom_model via:

python -m prodigy train --ner issue-6020 --base-model custom_model --training.max_steps=50 --training.eval_frequency=10

This is the epoch table I see at the end:

E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     12.67    0.00    0.00    0.00    0.00
 10      10          0.13    105.36    0.00    0.00    0.00    0.00
 20      20          0.03      6.47    0.00    0.00    0.00    0.00
 30      30          0.00      0.00    0.00    0.00    0.00    0.00
 40      40          0.00      0.00    0.00    0.00    0.00    0.00
 50      50          0.00      0.00    0.00    0.00    0.00    0.00

And another one based on en_core_web_sm, via:

python -m prodigy train --ner issue-6020 --base-model en_core_web_sm --training.max_steps=50 --training.eval_frequency=10

This gives a different table.

E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SPEED   SCORE 
---  ------  ------------  --------  ------  ------  ------  ------  ------
  0       0          0.00      7.72    0.00    0.00    0.00    0.00    0.00
 10      10          0.00     55.98    0.00    0.00    0.00    0.00    0.00
 20      20          0.00     35.33    0.00    0.00    0.00    0.00    0.00
 30      30          0.00      1.54    0.00    0.00    0.00    0.00    0.00
 40      40          0.00      0.00    0.00    0.00    0.00    0.00    0.00
 50      50          0.00      0.00    0.00    0.00    0.00    0.00    0.00

Back to your problem.

Could you repeat the same exercise on your machine? You don't have to train your own floret vectors but you'll notice that I ran my training script with --training.max_steps=50 and --training.eval_frequency=10. That allows me to see a difference. If I only looked at steps of 100, then the results might have indeed looked exactly the same.

fsa · October 14, 2022, 6:35pm

Thanks for the explanations
I followed the same steps and could create successfully the vectors.floret and used it to create my custom model. However, my custom model still make no change. Maybe because the vector was created based on small set of scientific text?
Using the en_core_web_sm as base-model make difference while training.

koaning · October 17, 2022, 8:55am

It could also theoretically be that the word vectors contribute very little predictive power. That might also explain what we're seeing here.

What are you evaluating on? Could you describe the number of annotations as well as the types of entities you're trying to predict?

koaning · October 17, 2022, 9:00am

Also! Could you share you config.cfg file? I'd be especially interested in the use_static_vectors = true settings.

fsa · October 17, 2022, 11:22am

Hi Koaning
here is the config.cfg (in my working directory):

# This is an auto-generated partial config. To use it with 'spacy train'
# you can run spacy init fill-config to auto-fill all default settings:
# python -m spacy init fill-config ./base_config.cfg ./config.cfg
[paths]
train = null
dev = null
vectors = null
[system]
gpu_allocator = null

[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000

[components]

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM", "PREFIX", "SUFFIX", "SHAPE"]
rows = [5000, 1000, 2500, 2500]
include_static_vectors = false

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 96
depth = 4
window_size = 1
maxout_pieces = 3

[components.ner]
factory = "ner"

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}

[corpora]

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"

[training.optimizer]
@optimizers = "Adam.v1"

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

[initialize]
vectors = ${paths.vectors}

As you can see there is no use_static_vectors = true, where to add it?
I tried to add my custom model path in vectors = ${paths.vectors} but has no effect

koaning · October 24, 2022, 9:23am

This segment:


[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM", "PREFIX", "SUFFIX", "SHAPE"]
rows = [5000, 1000, 2500, 2500]
include_static_vectors = false

I'd set that include_static_vectors to true.

As a side note, I'm also putting a screenshot of a relevant part of the spaCy docs found here:

At the moment you've generate a partial config. The first few lines of your file have comments that confirm this. It'd be less error prone to also run the spacy init fill-config command and to train on the config that has the extra values filled in.

Topic		Replies	Views
Loading fasttext vectors to spacy/prodigy ner , spacy , solved	9	1544	February 13, 2022
How do I work with available word vectors during NER training? ner , training	3	361	June 30, 2022
pretrained tok2vec weights - prodigy v 1.11 bug , ner , spacy	5	737	October 21, 2021
Loading gensim word2vec vectors for terms.teach? usage , terms , solved , gensim	17	5145	August 15, 2018
[E896] on training existing model (NER) usage , ner	1	298	October 10, 2023

word embeddings for prodigy train recipe

Start

Custom Vectors

Training Models

Back to your problem.

Related topics