You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
experiment with adding an optional mode to sidestep encoded logic conversions between formats and use an LLM to convert software metadata in one format to another, receiving structured data that is de-serialized into codemeticulous/pydantic objects
benefits are twofold:
avoid writing conversion logic by hand
potentially infer or keep more information than would otherwise be feasible
there are 2 possible sub-modes:
1. converting from unstructured or an unknown format
the data itself is the only context and we ask the LLM to extract anything it can
raw_detail_page=open("codebase-detail.html").read()
convert_ai(
source_format="unstructured",
target_format="codemeta",
model="openai/gpt-4o",
source_data=raw_detail_page
)
# returns a codemeticulous.codemeta.models.CodeMeta object supposedly describing the software record in codebase-detail.html
2. converting from known formats (CodeMeta, citation file, etc)
prompt can be contextualized with a known schema, term definitions/explanations, etc.
datacite_representation=DataCite(...)
convert_ai(
source_format="datacite",
target_format="codemeta",
model="openai/gpt-4o",
source_data=datacite_representation
)
# returns a codemeticulous.codemeta.models.CodeMeta object supposedly describing the software record in codebase-detail.html
experiment with adding an optional mode to sidestep encoded logic conversions between formats and use an LLM to convert software metadata in one format to another, receiving structured data that is de-serialized into codemeticulous/pydantic objects
benefits are twofold:
there are 2 possible sub-modes:
1. converting from unstructured or an unknown format
codemeticulous convert --ai --model=openai/gpt-4o --key=abc123 --to codemeta --unstructured codebase-detail.html > codemeta.json2. converting from known formats (CodeMeta, citation file, etc)
codemeticulous convert --ai --model=openai/gpt-4o --key=abc123 --to codemeta --from datacite datacite.json > codemeta.jsonNote
consider supporting multi-input (
[CITATION.cff, LICENSE, AUTHORS, github_repo] -> codemeta