[Feature request] Allow different network architectures for off-policy actor/critic

Currently, for on-policy algorithm we can specify `net_arch=[dict(pi=[64], vf=[64])]`

The idea would be to allow the same (expect the shared part which adds too much complexity) for off-policy algorithms:
`net_arch=[dict(pi=[64], qf=[64])]`.

This should be fairly simple to implement.