9/29/2025

AVRO

 

Updated Guide: Apache Avro™ with Java 

This is a short guide for getting started with Apache Avro using Java. This guide only covers using Avro for data serialization; 


📥 Download

Avro implementations for C, C++, C#, Java, PHP, Python, and Ruby can be downloaded from the Apache Avro™ Download page. This guide uses Avro 1.12.0, the latest version at the time of writing. For the examples in this guide, download avro-1.12.0.jar and avro-tools-1.12.0.jar.

Alternatively, if you are using Maven, add the following dependency to your pom.xml:

<dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.12.0</version> </dependency>

As well as the Avro Maven plugin (for performing code generation):

<plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> <version>1.12.0</version> <configuration> <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory> <outputDirectory>${project.basedir}/src/main/java/</outputDirectory> </configuration> <executions> <execution> <phase>generate-sources</phase> <goals> <goal>schema</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin>

You may also build the required Avro jars from source. Building Avro is beyond the scope of this guide; 


🧾 Defining a Schema

Avro schemas are defined using JSON or IDL (the latter requires an extra dependency). Schemas are composed of primitive types and complex types.

Here’s a simple schema example, student.avsc:

{ "namespace": "example.avro", "type": "record", "name": "Student", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]} ] }

This schema defines a record representing a hypothetical student. At minimum, a record definition must include its type ("record"), a name ("Student"), and its fields: name, favorite_number, and favorite_color.


⚙️ Serializing and Deserializing with Code Generation

Compiling the Schema

You can generate Java classes from the schema using avro-tools:

java -jar /path/to/avro-tools-1.12.0.jar compile schema student.avsc .

This will generate the appropriate source files in a package based on the schema’s namespace.


👩‍💻 Creating Students

Student student1 = new Student(); student1.setName("Alyssa"); student1.setFavoriteNumber(256); // Alternate constructor Student student2 = new Student("Ben", 7, "red"); // Construct via builder Student student3 = Student.newBuilder() .setName("Charlie") .setFavoriteColor("blue") .setFavoriteNumber(null) .build();

Avro objects can be created by constructors or builders. Builders provide schema validation and handle default values, while constructors are faster.


💾 Serializing

DatumWriter<Student> studentDatumWriter = new SpecificDatumWriter<>(Student.class); DataFileWriter<Student> dataFileWriter = new DataFileWriter<>(studentDatumWriter); dataFileWriter.create(student1.getSchema(), new File("students.avro")); dataFileWriter.append(student1); dataFileWriter.append(student2); dataFileWriter.append(student3); dataFileWriter.close();

📂 Deserializing

DatumReader<Student> studentDatumReader = new SpecificDatumReader<>(Student.class); DataFileReader<Student> dataFileReader = new DataFileReader<>(file, studentDatumReader); Student student = null; while (dataFileReader.hasNext()) { student = dataFileReader.next(student); System.out.println(student); }

Output:

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null} {"name": "Ben", "favorite_number": 7, "favorite_color": "red"} {"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}

🧪 Serializing and Deserializing Without Code Generation

You can also use GenericRecords to avoid code generation.

Schema schema = new SchemaParser().parse(new File("student.avsc")).mainSchema(); GenericRecord student1 = new GenericData.Record(schema); student1.put("name", "Alyssa"); student1.put("favorite_number", 256); GenericRecord student2 = new GenericData.Record(schema); student2.put("name", "Ben"); student2.put("favorite_number", 7); student2.put("favorite_color", "red");

Serialize:

DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter); dataFileWriter.create(schema, new File("students.avro")); dataFileWriter.append(student1); dataFileWriter.append(student2); dataFileWriter.close();

Deserialize:

DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema); DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(file, datumReader); GenericRecord student = null; while (dataFileReader.hasNext()) { student = dataFileReader.next(student); System.out.println(student); }

Output:

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null} {"name": "Ben", "favorite_number": 7, "favorite_color": "red"}

🚀 Compiling and Running the Example

Navigate to the project directory and run:

$ mvn compile $ mvn -q exec:java -Dexec.mainClass=example.SpecificMain

For the generic version:

$ mvn compile $ mvn -q exec:java -Dexec.mainClass=example.GenericMain

⚡ Beta Feature: Faster Code Generation

Enable faster encoding/decoding with:

$ mvn -q exec:java -Dexec.mainClass=example.SpecificMain \ -Dorg.apache.avro.specific.use_custom_coders=true

No schema recompilation is required. This is a runtime toggle via a system property.


🧠 Summary

You’ve now learned how to:

  • Define a schema for a Student

  • Serialize/deserialize using Avro with and without code generation

  • Use Avro tools and Maven integration

  • Optimize performance with reuse patterns and optional beta features


Niciun comentariu:

ANTLR 4 into a Quarkus app

  Using ANTLR 4 with Quarkus: A Practical Guide Why combine ANTLR + Quarkus? Quarkus gives you a fast, cloud/native Java framework with ...