Precision and recall will be the same if the number of predictions is the same as the number of true annotations. If precision and recall are the same, then F-score must be the same value as both of them as well (since F-score is the harmonic mean of the two values).
You probably want to have a look at your predictions and compare them to the gold standard, to see what’s up. It might be that your model only makes mistakes on the entity type, but not the span boundaries, for instance. Or it might be a less interesting coincidence — after all, the evaluation set is quite small.