@Sample – Sample Data Retrieval and Automatic Type Inference (Design)¶
English | 日本語
Purpose¶
Automatically infer static types from sample data based on declarations like @Sample("users.json") val users, providing end-to-end processing from sample data to Java 25 source generation. Users can obtain safely typed code and initial data by simply placing small samples.
Overview¶
- Supported schemas: File paths (relative/absolute),
file://,https://,s3:// - Operation mode: Switch with
modeoption (default:Embed) Embed(default): Retrieve and analyze sample data at compile time, embed into generated codeLoad: Generate code (loader) that loads sample data at runtime- Additional options (optional):
format:json|csv|tsv|auto(defaultauto, inferred from extension/Content-Type)sha256: Content verification hash (replay/tampering prevention)limitBytes: Embedding limit (default 1 MiB, only forEmbed)
Examples:
@Sample("users.json") val users // => mode=Embed (default)
@Sample("https://example.com/users.json", mode=Load) val users
@Sample(source="s3://bucket/key.json", mode=Embed, sha256="<hex>") val users
Type Inference¶
- JSON objects → Generate
record(immutable) or regular class (mutable) (jv uses data class → Java record as basis) - JSON arrays → Infer as
List<T>. Element unions compose least upper bound with structural commonality (missing fields represented asOptional<T>/nullable) - Primitive inference:
- Integers:
int/long/BigInteger(selected by range) - Decimals:
double/BigDecimal(selected by digits/precision) - String/boolean/null →
String/boolean/Optional<T>(or nullable) - CSV/TSV infers first row header as field names, subsequent rows as record columns. Types determined from entire column.
Embed (Compile-time Embedding)¶
- Retrieve data at compile time → parse → serialize and embed into Java source
- Large volume countermeasure: Build failure on
limitBytesexcess or automatic fallback toLoad(--sample-embed-fallback=load|fail) - Generated artifacts:
record/class groups corresponding to inferred schema- Initialization code like
public static final List<User> users = List.of( … ); - Network default: Disabled by default. Explicitly allow with
--sample-network=alloworjv.toml. Recommendsha256when usinghttps:///s3://.
Load (Runtime Loading)¶
- Generated code uses the following loaders:
file:///path:java.nio.file.Fileshttps:///http://: Java 11+HttpClients3://: Not bundled by default (zero-dependency policy). Only integrated whenjv_s3optional extension is available. Recommended to issue presigned URLs in advance and usehttps://.- Exception/retry: Simple retry (exponential backoff) and explicit throwing of
IOException/HttpTimeoutException - Cache: Optional disk cache at
~/.cache/jv/sample/<sha256>(--sample-cache=on|off)
Security¶
- Default: Network retrieval disabled (both
Embed/Load) - Support
sha256pinning (error on mismatch) s3://not bundled by default due to credential handling. Explicit introduction via optional extension only when needed.
Configuration/CLI¶
- Annotation:
@Sample(source: String, mode: Embed|Load = Embed, format: String = "auto", sha256: String? = null, limitBytes: Int? = null)jv.toml(example):- CLI flags (
jv build): --sample-mode=embed|load(override default)--sample-network=allow|deny(default deny)--sample-embed-max-bytes=N--sample-embed-fallback=load|fail--sample-cache=on|off
Implementation Approach (Stages)¶
1) Introduce annotation representation to parser/AST (@Name(args) attached to val/var/class etc.)
2) Detect @Sample in IR → pass sample source to type inference phase
3) Implement type inferrer (JSON/CSV), generate record/class definitions and initialization expressions
4) Embed path: Embed initialization expressions into Java AST
5) Load path: Generate loader (file/http) with minimal dependencies, deserialize → map to types
6) CLI/configuration integration, security (network permission/hash verification/capacity limits/cache)
Error Design¶
- Source unreachable/404/permission errors: Output diagnostics with specific URI and hints
- Type mismatch (significant inconsistency in array elements): Present least upper bound inference failure reasons, allow mitigation with
--sample-infer=loose|strict - Embedding size excess: Recommend countermeasures (subsetting, switch to
Load, increaseembed_max_bytes)
Examples¶
// Embed local JSON with default (Embed)
@Sample("data/users.json") val users
// Load from HTTP at runtime
@Sample("https://example.com/users.json", mode=Load) val users
// Retrieve s3 at compile time with hash verification and embed
@Sample(source="s3://my-bucket/users.json", mode=Embed, sha256="ab12…") val users
This specification provides advance definition of documentation and CLI consensus interfaces, with language-side (AST/IR/code generation) implementation to be added in stages.